Estimation and bootstrap for stochastically monotone Markov processes

Neumann, Michael H.

doi:10.1007/s00184-023-00903-7

Estimation and bootstrap for stochastically monotone Markov processes

Open access
Published: 28 February 2023

Volume 87, pages 31–59, (2024)
Cite this article

Download PDF

You have full access to this open access article

Metrika Aims and scope Submit manuscript

Estimation and bootstrap for stochastically monotone Markov processes

Download PDF

Michael H. Neumann ORCID: orcid.org/0000-0002-5783-831X¹

1080 Accesses
Explore all metrics

Abstract

The Markov property is shared by several popular models for time series such as autoregressive or integer-valued autoregressive processes as well as integer-valued ARCH processes. A natural assumption which is fulfilled by corresponding parametric versions of these models is that the random variable at time t gets stochastically greater conditioned on the past, as the value of the random variable at time $t-1$ increases. Then the associated family of conditional distribution functions has a certain monotonicity property which allows us to employ a nonparametric antitonic estimator. This estimator does not involve any tuning parameter which controls the degree of smoothing and is therefore easy to apply. Nevertheless, it is shown that it attains a rate of convergence which is known to be optimal in similar cases. This estimator forms the basis for a new method of bootstrapping Markov chains which inherits the properties of simplicity and consistency from the underlying estimator of the conditional distribution function.

Martingale Estimating Functions for Stochastic Processes: A Review Toward a Unifying Tool

Bootstrap for Maximum Likelihood Estimates of PARMA Coefficients

Approximating Markov Chains for Bootstrapping and Simulation

1 Introduction

We consider a time-homogeneous Markov chain $(X_t)_{t\in {\mathbb N}_0}$ driven by a transition kernel which satisfies a certain monotonicity property: the conditional distribution of the random variable at time t gets stochastically greater as the value of the variable at time $t-1$ increases. Such a condition is actually satisfied by several popular models for time series such as autoregressive or integer-valued autoregressive as well as integer-valued ARCH processes under natural assumptions on the involved parameters. To be specific, we assume that, for each fixed z, $F_x(z):=P(X_t\le z\mid X_{t-1}=x)$ is antitonic (monotonically non-increasing) in x. This assumption allows us to employ a nonparametric antitonic estimator $\widehat{F}_x(z)$ of the function $x\mapsto F_x(z)$. Our estimator does not involve any tuning parameter which controls the degree of smoothing and is therefore easy to apply. Moreover, its consistency does not require smoothness properties of the function $x\mapsto F_x(z)$; the postulated monotonicity suffices. Theorem 2.1 states that the estimator $\widehat{F}_x(z)$ converges in $L^1$ norm, weighted by the stationary distribution of the Markov chain, with a rate of $n^{-1/3}$ which is believed to be the optimal one.

The estimator of $F_x(z)$ serves as a basis for a new bootstrap method for Markov chains. Among several other methods, those proposed by Rajarshi (1990) and Paparoditis and Politis (2002) are the closest ones to our proposal. While Rajarshi’s bootstrap procedure is based on a nonparametric estimate of the one-step transition density, Paparoditis and Politis (2002) used in their so-called local bootstrap a local resampling of the original data set. In both papers, the proof of consistency of the respective bootstrap method is based on the assumption of a smooth transition density. In contrast, our approach does not require any smoothness assumption on the transition mechanism; it is merely based on the monotonicity assumption on the Markov kernel. We show its applicability for Markov chains with state space ${\mathbb N}_0=\{0,1,2,\ldots \}$. Consistency of bootstrap can be shown in a most transparent way by a so-called coupling of the original process and its bootstrap counterpart, i.e. we define versions $(\widetilde{X}_t)_{t\in {\mathbb N}_0}$ and $(\widetilde{X}_t^*)_{t\in {\mathbb N}_0}$ of these processes on a common probability space $(\widetilde{\Omega },\widetilde{\mathcal A},\widetilde{P})$ such that the corresponding random variables $\widetilde{X}_t$ and $\widetilde{X}_t^*$ are equal with a high probability. Somewhat surprisingly, this natural approach was rarely used in statistics. Using Mallows metric to measure the distance between variables from the original and the bootstrap process, it was implicitly employed in the context of independent random variables by Bickel and Freedman (1981) and Freedman (1981). A more explicit use of coupling was made, in the context of U- and V-statistics, but again in the independent case, by Dehling and Mikosch (1994) and Leucht and Neumann (2009). For dependent data, this approach was adopted by Leucht and Neumann (2013), Leucht et al. (2015), and Neumann (2021). Our second main result, Theorem 3.1, describes the results of our coupling approach. The stationary distribution $P^*_{X^*}$ of the bootstrap process converges in total variation norm and in probability to that of the original process. The coupled process is $\phi $-mixing with coefficients decaying at an exponential rate and the corresponding values $\widetilde{X}_t^0$ and $\widetilde{X}_t^{*,0}$ of a stationary version of the coupled process coincide with a probability converging to 1. These general results can then be used to prove bootstrap consistency for specific statistics. The proofs of our main theorems and some auxiliary results a postponed to a final Sect. 4.

2 An estimator of a monotone family of distribution functions

Suppose that we observe random variables $X_0,X_1,\ldots ,X_n$, where ${\textbf{X}}=(X_t)_{t\in {\mathbb N}_0}$ is a strictly stationary Markov chain with state space $D\subseteq {\mathbb R}$, defined on a probability space $(\Omega ,{\mathcal A},P)$. We denote the stationary distribution by $P_X$ and the corresponding distribution function by $F_X$. Let $(F_x)_{x\in {\mathbb R}}$ defined as $F_x(z)=P(X_t\le z\mid X_{t-1}=x)$ be the corresponding family of conditional distribution functions. We impose the following as our key assumption.

(A1)
For each $z\in {\mathbb R}$, the function $x\mapsto F_x(z)$ is monotonically non-increasing, i.e. if $x_1< x_2$, then $P(X_t\le z\mid X_{t-1}=x_1)\ge P(X_t\le z\mid X_{t-1}=x_2)$. In addition we suppose that
(A2)
${\textbf{X}}=(X_t)_{t\in {\mathbb N}_0}$ is strong mixing with exponentially decaying coefficients $\alpha _X(k)$, i.e.
$$\begin{aligned} \alpha _X(k) \,=\, O\big ( \rho ^k \big ), \end{aligned}$$
for some $\rho \in [0,1)$.

Assumption (A1) may be paraphrased as follows. If $x_1<x_2$ and if $Y_1$ and $Y_2$ are random variables following the respective conditional distributions $P^{X_t\mid X_{t-1}=x_1}$ and $P^{X_t\mid X_{t-1}=x_2}$, then $Y_2$ is stochastically not smaller than $Y_1$. It turns out that this assumption is actually satisfied by popular classes of Markov chain models under natural assumptions. Here is a list of models we have in mind:

(1)
Nonlinear autoregressive processes with non-decreasing link The process ${\textbf{X}}=(X_t)_{t\in {\mathbb N}_0}$ is assumed to obey the model equation
$$\begin{aligned} X_t \,=\, f(X_{t-1}) \,+\, \varepsilon _t \qquad \forall t\in {\mathbb N}, \end{aligned}$$
where $(\varepsilon _t)_{t\in {\mathbb N}}$ is a sequence of i.i.d. random variables and $\varepsilon _t$ is independent of $X_{t-1},\ldots ,X_0$. If the function $f:\,{\mathbb R}\rightarrow {\mathbb R}$ is monotonically non-decreasing, then, for $x_1<x_2$,
$$\begin{aligned} P\big (X_t\le z\mid X_{t-1}=x_1\big )= & {} P\big (\varepsilon _t\le z-f(x_1)\big ) \,\ge \, P\big (\varepsilon _t\le z-f(x_2)\big )\\= & {} P\big (X_t\le z\mid X_{t-1}=x_2\big ). \end{aligned}$$
Furthermore, if $\varepsilon _t$ has an everywhere positive density and if
$$\begin{aligned} \big | f(x) \big | \,\le \, \gamma |x| \,-\, \epsilon \qquad \forall x\ge K, \end{aligned}$$
for some $\gamma <1$, $\epsilon >0$, and $K<\infty $, then the process ${\textbf{X}}$ has a unique stationary distribution and satisfies (A2); see e.g. Doukhan (1994).
(2)
Branching processes with immigration Let $X_0$, $(Z_{t,k})_{t,k\in {\mathbb N}}$ and $(\varepsilon _t)_{t\in {\mathbb N}}$ be mutually independent random variables taking values in ${\mathbb N}_0$. We assume that $(Z_{t,k})_{t,k\in {\mathbb N}}$ as well as $(\varepsilon _t)_{t\in {\mathbb N}}$ are sequences of identically distributed random variables. Then the process ${\textbf{X}}=(X_t)_{t\in {\mathbb N}_0}$ given by
$$\begin{aligned} X_t \,=\, \sum _{k=1}^{X_{t-1}} Z_{t,k} \,+\, \varepsilon _t \qquad \forall t\in {\mathbb N}\end{aligned}$$
is a branching process with immigration. In the special case of $Z_{t,k}\sim \hbox {Bin}(1,\alpha )$ we obtain a so-called first-order integer-valued autoregressive (INAR(1)) process which was proposed by McKenzie (1985) and Al-Osh and Alzaid (1987). Since the $Z_{t,k}$ are non-negative random variables, it is obvious that (A1) is fulfilled. If in addition $E\varepsilon _t<\infty $ and $EZ_{t,k}<1$, then ${\textbf{X}}$ has a unique stationary distribution and satisfies (A2); see Pakes (1971).
(3)
Poisson-INARCH processes The process ${\textbf{X}}=(X_t)_{t\in {\mathbb N}_0}$ is an integer-valued ARCH process of order 1 with Poisson innovations (Poisson-INARCH(1)) if
$$\begin{aligned} X_t\mid {\mathcal F}_{t-1} \sim \hbox {Poisson}\big ( f(X_{t-1}) \big ), \end{aligned}$$
where ${\mathcal F}_s$ denotes the $\sigma $-algebra generated by $X_0,\ldots ,X_s$. If f is monotonically non-decreasing, then we obtain, for $x_1<x_2$ and $Y_1\sim \hbox {Poisson}(f(x_1))$, $Y_2\sim \hbox {Poisson}(f(x_2))$,
$$\begin{aligned} P( X_t\le z\mid X_{t-1}=x_1 ) \,=\, P( Y_1\le z) \,\ge \, P( Y_2\le z) \,=\, P( X_t\le z\mid X_{t-1}=x_1 ), \end{aligned}$$
i.e., (A1) is fulfilled. Furthermore, if in addition
$$\begin{aligned} f(x) \,\le \, \gamma x \,-\, \epsilon \qquad \forall x\ge K, \end{aligned}$$
for some $\gamma <1$, $\epsilon >0$, and $K<\infty $, then ${\textbf{X}}$ has a unique stationary distribution and satisfies (A2); see e.g. Theorem 2 in Doukhan (1994, Sec. 2.4, p. 90).

We consider an estimator of $F_x(z)=P\big ( X_t\le z\mid X_{t-1}=x\big )$ which takes into account that the function $x\mapsto F_x(z)$ is monotonically non-increasing under (A1). Nonparametric estimators of monotone functions have a long history and were proposed e.g. by Brunk (1955) and Ayer et al. (1955). Denote by ${\mathbb {1}}(\cdot )$ the indicator function. For $z\in D$ and $x\in \{X_0,\ldots ,X_{n-1}\}$, we define

$$\begin{aligned} \widehat{F}_x^{(\max -\min )}(z):=\, \max _{v:\, v\ge x} \; \min _{u:\, u\le x} \frac{\sum _{t=1}^n {\mathbb {1}}\big ( X_t\le z, X_{t-1}\in [u,v]\big )}{\#\{t\le n:\, X_{t-1}\in [u,v]\}} \end{aligned}$$

(2.1a)

and

$$\begin{aligned} \widehat{F}_x^{(\min -\max )}(z):=\, \min _{u:\, u\le x} \; \max _{v:\, v\ge x} \frac{\sum _{t=1}^n {\mathbb {1}}\big ( X_t\le z, X_{t-1}\in [u,v]\big )}{\#\{t\le n:\, X_{t-1}\in [u,v]\}}. \end{aligned}$$

(2.1b)

It is well-known that $\widehat{F}_x^{(\max -\min )}(z)=\widehat{F}_x^{(\min -\max )}(z)$ for all $x\in \{X_0,\ldots ,X_{n-1}\}$, see e.g. Theorem 1 in Brunk (1955) and Theorem 1.4.4 in Robertson, Wright, and Dykstra (1988, p. 23). As pointed out by Deng and Zhang (2020), (2.1a) and (2.1b) have to be modified for $x\not \in \{X_0,\ldots ,X_{n-1}\}$. Since it could well happen that an interval with $x\in [u,v]$ does not contain any point from the collection $\{X_0,\ldots ,X_{n-1}\}$ we set $n_{u,v}=\#\{t\le n:\, X_{t-1}\in [u,v]\}$, $n_{u,*}=\#\{t\le n:\, u\le X_{t-1}\}$, $n_{*,v}=\#\{t\le n:\, X_{t-1}\le v\}$, and define

$$\begin{aligned} \widehat{F}_x^{(\max -\min )}(z):=\, \max _{v:\, v\ge x,\,n_{*,v}>0} \; \min _{u:\, u\le x,\, n_{u,v}>0} \frac{\sum _{t=1}^n {\mathbb {1}}\big ( X_t\le z, X_{t-1}\in [u,v]\big )}{\#\{t\le n:\, X_{t-1}\in [u,v]\}} \nonumber \\ \end{aligned}$$

(2.2a)

and

$$\begin{aligned} \widehat{F}_x^{(\min -\max )}(z):=\, \min _{u:\, u\le x,\,n_{u,*}>0} \; \max _{v:\, v\ge x,\,n_{u,v}>0} \frac{\sum _{t=1}^n {\mathbb {1}}\big ( X_t\le z, X_{t-1}\in [u,v]\big )}{\#\{t\le n:\, X_{t-1}\in [u,v]\}}.\nonumber \\ \end{aligned}$$

(2.2b)

The estimators $\widehat{F}_x^{(\max -\min )}(z)$ and $\widehat{F}_x^{(\min -\max )}(z)$ are both non-increasing in x as the maxima are taken over non-increasing classes indexed by x and the minima over non-decreasing classes. Furthermore, for fixed $x\in D$, the mappings $z\mapsto \widehat{F}_x^{(\max -\min )}(z)$ and $z\mapsto \widehat{F}_x^{(\min -\max )}(z)$ are non-decreasing which follows from the isotonicity of the functions $z\mapsto {\mathbb {1}}(X_t\le z, X_{t-1}\in [u,v])$. Furthermore, if $X_{[1]},\ldots ,X_{[n]}$ is an enumeration of the values in $\{X_1,\ldots ,X_n\}$ in non-decreasing order, then it follows that, again for fixed $x\in D$, the mappings $z\mapsto \widehat{F}_x^{(\max -\min )}(z)$ and $z\mapsto \widehat{F}_x^{(\min -\max )}(z)$ are constant on the half-open intervals $[X_{[k]},X_{[k+1]})$ ($k=1,\ldots ,n-1$), and attain the respective values 0 and 1 on $(-\infty ,X_{[1]})$ and $[X_{[n]},\infty )$. Hence, these estimators are genuine probability distribution functions.

We choose as our estimator of $F_x(z)$

$$\begin{aligned} \widehat{F}_x(z):=\, \big ( \widehat{F}_x^{(\max -\min )}(z) \,+\, \widehat{F}_x^{(\min -\max )}(z) \big )/2. \end{aligned}$$

It follows that all of the above properties of $\widehat{F}_x^{(\max -\min )}(z)$ and $\widehat{F}_x^{(\min -\max )}(z)$ are inherited by $\widehat{F}_x(z)$. Its performance is characterized by the following theorem.

Theorem 2.1

Suppose that (A1) and (A2) are fulfilled. Then

$$\begin{aligned} \sup _z \left\{ E\left[ \int _D \big | \widehat{F}_x(z) \,-\, F_x(z) \big | \, dP_X(x) \right] \right\} \,=\, O\big ( n^{-1/3} \big ). \end{aligned}$$

The rate of convergence $n^{-1/3}$ is known to be optimal in related problems of estimating a monotone function on the basis of independent random variables; see e.g. Durot (2002, Theorem 1) and Zhang (2002, Theorem 2.3). We believe that this rate cannot be improved in our more delicate case of time series data. Note that Mösching and Dümbgen (2020) considered a nonparametric antitonic estimator of $F_x$ in a regression context where the dependent variables, conditional on the regressors, are independent. They derived under additional Hölder conditions rates of uniform and pointwise convergence for this estimator.

Our approach to prove this result can be most easily explained if the distribution function $F_X$ is continuous. We split the domain D into $k_n=\lfloor n^{1/3}\rfloor $ intervals $I_k=[x_{k-1},x_k)$, where $x_0=-\infty $ if $D={\mathbb R}$, $x_0=0$ if $D={\mathbb N}_0$ and, in both cases, $x_k=F_X^{-1}(k/k_n)=\sup \{x:\, F_X(x)\ge k/k_n\}$ for $k=1,\ldots ,k_n-1$, $x_{k_n}=\infty $. (As usual, $\lfloor a\rfloor $ denotes the largest integer less than or equal to a.) We can expect a favorable behavior of $\widehat{F}_x(z)$ if $N_k(\omega ):=\#\{t\le n:\, X_{t-1}(\omega )\in I_k\}$ is sufficiently large for all k. Let

$$\begin{aligned} A_n \,=\, \big \{\omega :\, N_k(\omega )\ge n/(2k_n) \quad \hbox {for all }\; k=1,\ldots ,k_n\big \}. \end{aligned}$$

It follows from Lemma 4.2 that $P(A_n^c)=O(n^{-1/3})$. Since $\int _D \big | \widehat{F}_x(z) \,-\, F_x(z) \big | \, dP_X(x)\le 1$ holds with probability 1, we obtain that

$$\begin{aligned} E\left[ \int _D \big | \widehat{F}_x(z) \,-\, F_x(z) \big | \, dP_X(x) \; {\mathbb {1}}_{A_n^c}\right] \,\le \, P\big (A_n^c\big ) \,=\, O\big ( n^{-1/3} \big ). \end{aligned}$$

(2.3)

To estimate $E\big [ \int _D \big (\widehat{F}_x(z) - F_x(z)\big )_+ \, dP_X(x) \; {\mathbb {1}}_{A_n}\big ]$ we proceed as follows. For $x\in I_k$, $k\in \{2,\ldots ,k_n\}$, we use the estimate

$$\begin{aligned}{} & {} \big ( \widehat{F}_x(z) \,-\, F_x(z) \big )^+ \, {\mathbb {1}}_{A_n} \le \big ( \widehat{F}_{x_{k-1}}(z) \,-\, F_{x_k}(z) \big )^+ \, {\mathbb {1}}_{A_n} \\{} & {} \quad \le \max _{v:\, v\ge x_{k-1}} \left\{ \frac{ \big | \sum _{t=1}^n \big [ {\mathbb {1}}(X_t\le z) \,-\, F_{X_{t-1}}(z) \big ]\; {\mathbb {1}}(X_{t-1}\in [x_{k-2},v]) \big | }{ \#\{t\le n:\, X_{t-1}\in [x_{k-2},v]\} } \; {\mathbb {1}}_{A_n} \right\} \\{} & {} \qquad +\, \big ( F_{x_{k-2}}(z) \,-\, F_{x_k}(z) \big ). \end{aligned}$$

We obtain from Lemma 4.3 that

$$\begin{aligned}{} & {} E\left[ \max _{v:\, v\ge x_{k-1}} \left\{ \frac{ \big | \sum _{t=1}^n \big [ {\mathbb {1}}(X_t\le z) \,-\, F_{X_{t-1}}(z) \big ]\; {\mathbb {1}}(X_{t-1}\in [x_{k-2},v]) \big | }{ \#\{t\le n:\, X_{t-1}\in [x_{k-2},v]\} } \; {\mathbb {1}}_{A_n} \right\} \right] \nonumber \\{} & {} \quad =\, O\big ( n^{-1/3} \big ). \end{aligned}$$

Since

$$\begin{aligned} \sum _{k=2}^{k_n} \big ( F_{x_{k-2}}(z) \,-\, F_{x_k}(z) \big )= & {} \sum _{k=2}^{k_n} \big ( F_{x_{k-2}}(z) \,-\, F_{x_{k-1}}(z) \big ) \,+\, \sum _{k=2}^{k_n} \big ( F_{x_{k-1}}(z) \,-\, F_{x_k}(z) \big ) \\= & {} \big ( F_{x_0}(z) \,-\, F_{x_{k_n-1}}(z) \big ) \,+\, \big ( F_{x_1}(z) \,-\, F_{x_{k_n}}(z) \big ) \,\le \, 2. \end{aligned}$$

we conclude that

$$\begin{aligned} \sum _{k=2}^{k_n} E\left[ \int _{I_k} \big ( \widehat{F}_x(z) \,-\, F_x(z) \big )^+ \, dP_X(x) \; {\mathbb {1}}_{A_n} \right] \,=\, O\big ( n^{-1/3} \big ). \end{aligned}$$

Furthermore, the rough estimate

$$\begin{aligned} E\left[ \int _{I_1} \big ( \widehat{F}_x(z) \,-\, F_x(z) \big )^+ \, dP_X(x) \; {\mathbb {1}}_{A_n} \right] \,\le \, P_X( I_1 ) \,\le \, n^{-1/3} \end{aligned}$$

is obviously true, which leads to

$$\begin{aligned} E\left[ \int _{D} \big ( \widehat{F}_x(z) \,-\, F_x(z) \big )^+ \, dP_X(x) \; {\mathbb {1}}_{A_n} \right] \,=\, O\big ( n^{-1/3} \big ). \end{aligned}$$

(2.4)

We can prove

$$\begin{aligned} E\left[ \int _{D} \big ( \widehat{F}_x(z) \,-\, F_x(z) \big )^- \, dP_X(x) \; {\mathbb {1}}_{A_n} \right] \,=\, O\big ( n^{-1/3} \big ). \end{aligned}$$

(2.5)

in complete analogy to (2.4). The result stated in Theorem 2.1 follows from (2.3) to (2.5). In the general case we have to take into account that the distribution function $F_X$ is not necessarily continuous. This leads to a technically more involved proof which is presented in full detail in Sect. 4.

The following pictures give an impression of how the functions $x\mapsto F_x(z)$ are approximated by $\widehat{F}_x(z)$ for different values of z. We simulated a Poisson-INARCH process of order 1, where $X_t\mid X_{t-1},X_{t-2},\ldots \sim \hbox {Poisson}\big ( f(X_{t-1}) \big )$ and $f(x)=\min \big \{\alpha _0+\alpha _1 x, \beta \big \}$. The parameters $\alpha _0$ and $\alpha _1$ are chosen as 2.0 and 0.5, respectively, and the truncation constant $\beta $ is set to 6.0. For a sample size $n=1000$ and $z=0,1,\ldots ,11$, the following pictures show $F_x(z)$ (red lines) and a corresponding estimate $\widehat{F}_x(z)$ (blue lines). These results are quite encouraging except for large values of x. We conjecture that this deficiency is caused by data sparsity in this region.

3 A new bootstrap method for Markov chains

Our estimator $\widehat{F}_x(z)$ can be used for bootstrapping Markov processes, and it is particularly suitable in case of Markov chains with a finite or countably infinite state space. In what follows we assume that $(X_t)_{t\in {\mathbb N}_0}$ is a stationary Markov chain which has a state space $D\subseteq {\mathbb N}_0$. Bootstrap variates $X_t^*$ are generated successively according to a slightly modified variant of our estimator $\widehat{F}_x(z)$. To prove consistency, we retain our monotonicity condition (A1), however, we replace (A2) by the following stronger condition which ensures that both the original and the bootstrap process satisfy a useful mixing condition and possess respective stationary distributions.

(A3) There exist a finite set $S=\big \{y\in D:\; y\le \bar{s} \hbox { and } P_X(\{y\})>0\big \}$, a probability measure Q on $({\mathbb N}_0,2^{{\mathbb N}_0})$, and constants $\delta >0$, $\kappa >0$, $\gamma >0$, and $C<\infty $ such that

(i)
$P(X_t\in S\mid X_{t-1}=x) \,\ge \, \delta \,>\, 0 \qquad \qquad \forall x\in {\mathbb N}_0$,
(ii)
$P(X_t=y \mid X_{t-1}=x) \,\ge \, \kappa \cdot Q(\{y\}) \qquad \forall x\in S,\;\forall y\in {\mathbb N}_0$,
(iii)
$P(X_t\ge x)\,\le \, C\, e^{-\gamma x}\qquad \forall x\in {\mathbb N}_0$.

(A3) (ii) means that the set S is a so-called small set and condition (A3) (i) ensures that this set can be reached from each point $x\in \Omega _X$ with a probability not smaller than $\delta $. It follows from these conditions that

$$\begin{aligned} \inf _x P\big ( X_{t+2}=y \mid X_t=x \big ) \,\ge \, \delta \cdot \kappa \cdot \, Q\big ( \{y\} \big ) \qquad \forall y\in {\mathbb N}_0. \end{aligned}$$

Hence, Doeblin’s minorization condition is satisfied and it follows that the process $(X_t)_{t\in {\mathbb N}_0}$ has a unique stationary distribution $P_X$, is geometrically ergodic, and is uniform ($\phi $-) mixing with exponentially decaying coefficients.; see e.g. Theorem 1 in Doukhan (1994 Sec. 2.4, p. 88). In particular, a stationary version of the process satisfies (A2). Note that condition (A3) is satisfied e.g. by a Poisson-INARCH(1) process if the function f is bounded. While (i) and (ii) are obviously fulfilled, (iii) follows from the upper tail bound

$$\begin{aligned} P\big ( Y\ge \lambda +x \big ) \,\le \, e^{-\frac{x^2}{2(\lambda +x)}} \qquad \forall x\ge 0 \end{aligned}$$

which holds for $Y\sim \hbox {Poisson}(\lambda )$; see Theorem 1 in Canonne (2017).

Before we fix the definition of our bootstrap process we check to what extent a process with transition distribution functions $\widehat{F}_x(\cdot )$ satisfies a suitable variant of condition (A3). It follows from Theorem 2.1 that

$$\begin{aligned} \widehat{F}_x(z) \,=\, F_x(z) \,+\, O_P\big ( n^{-1/3} \big ) \end{aligned}$$

if $P_X(\{x\})>0$. This implies that

$$\begin{aligned} P\Big ( \inf \big \{\widehat{F}_x(y)-\widehat{F}_x(y-1):\; x\in S,\, P_X(\{x\})>0\big \} \ge \frac{\kappa }{2}\, Q^*(\{y\}) \quad \forall y\in {\mathbb N}_0\Big ) \mathop {\longrightarrow }\limits _{n\rightarrow \infty }1,\nonumber \\ \end{aligned}$$

(3.1)

where e.g.

$$\begin{aligned} Q^*\big ( \{y\} \big ) \,=\, \left\{ \begin{array}{ll} Q\big ( \{y\} \big ) &{} \quad \hbox { if }\quad y\le \bar{y}, \\ 0 &{} \quad \hbox { if }\quad y>\bar{y} \end{array} \right. \end{aligned}$$

and $\bar{y}$ such that $Q(\{0,1,\ldots ,\bar{y}\})\ge 1/2$. Hence, a bootstrap process based on $\widehat{F}_x(\cdot )$ satisfies a variant of (A3) (ii) with a probability tending to 1.

For a variant of (A3) (i) to hold, it is important that $\inf _x \{\widehat{F}_x(\bar{s}):\, x\in {\mathbb N}_0\}>0$ is also satisfied with a probability tending to 1. This is not guaranteed to be true since the estimator $\widehat{F}_x(\bar{s})$ may get unreliable if x gets large. Indeed, the natural lower estimate of $\widehat{F}_x(\bar{s})$ is given by

$$\begin{aligned} \widehat{F}_x\big (\bar{s}\big ) \,\ge \, \inf _{u:\, u\le x} \frac{ \sum _{t=1}^n {\mathbb {1}}(X_t\le \bar{s},\, X_{t-1}\in [u,\infty ) ) }{ \#\{t\le n:\; X_{t-1}\in [u,\infty )\} }. \end{aligned}$$

However, the right-hand side of this inequality can get arbitrarily close to 0 if x is large since then $P_X( [x,\infty ))$ gets small. It is actually a well-known shortcoming of nonparametric isotonic/antitonic estimators that they get unreliable near the ends of the domain of the explanatory variable. In view of this problem, we modify $\widehat{F}_x(z)$ for large x. Let

$$\begin{aligned} \widetilde{x}:=\, \sup \big \{ x:\; \#\{t\le n:\; X_{t-1}\ge x\} \,\ge \, n^{2/3} \big \}. \end{aligned}$$

Then $\#\{t\le n:\, X_{t-1}\ge \widetilde{x}\}\ge n^{2/3}>\#\{t\le n:\, X_{t-1}>\widetilde{x}\}$. We define

$$\begin{aligned} \widehat{\widehat{F}}_x(z):=\, \left\{ \begin{array}{ll} \widehat{F}_x(z) &{} \qquad \hbox { if }\quad x\le \widetilde{x}, \\ \widehat{F}_{\widetilde{x}}(z) &{} \qquad \hbox { if } \quad x> \widetilde{x} \end{array} \right. . \end{aligned}$$

In what follows we show that the modified estimator $\widehat{\widehat{F}}_x(\cdot )$ actually satisfies a suitable variant of (A3). To take advantage of Lemma 4.2 we embed the random truncation point $\widetilde{x}$ between two nonrandom points, $\widetilde{x}_l$ and $\widetilde{x}_u$. Let $\widetilde{x}_l:=\sup \left\{ x:\, P_X\big ( [x,\infty ) \big )\ge 2n^{-1/3} \right\} $ and $\widetilde{x}_u:=\sup \big \{ x:\, P_X\big ( [x,\infty ) \big )\ge (1/2)n^{-1/3} \big \}$. Then $P_X\big ( [\widetilde{x}_l,\infty ) \big )\ge 2n^{-1/3}\ge P_X\big ( (\widetilde{x}_l,\infty ) \big )$ and $P_X\big ( [\widetilde{x}_u,\infty ) \big )\ge (1/2)n^{-1/3}\ge P_X\big ( (\widetilde{x}_u,\infty ) \big )$. Since $\widetilde{x}>\widetilde{x}_u$ implies that $\#\big \{t\le n:\, X_{t-1}>\widetilde{x}_u\big \}\ge n^{2/3}$ we obtain by Lemma 4.2 that

$$\begin{aligned} P\big ( \widetilde{x}> \widetilde{x}_u \big )\le & {} P\Big ( \# \big \{t\le n:\, X_{t-1}>\widetilde{x}_u \big \} \,\ge \, n^{2/3} \Big ) \\\le & {} P\Big ( \# \big \{t\le n:\, X_{t-1}>\widetilde{x}_u \big \} \,-\, nP_X\big ( (\widetilde{x}_u,\infty ) \big ) \,\ge \, n^{2/3}/2 \Big )\\&\,=\,&O\big ( n^{-1} \big ). \end{aligned}$$

On the other hand, if $\widetilde{x}\le \widetilde{x}_u$, then

$$\begin{aligned} \inf _x \widehat{\widehat{F}}_x\big ( \bar{s} \big ) \,=\, \widehat{F}_{\widetilde{x}}\big ( \bar{s} \big ) \,\ge \, \widehat{F}_{\widetilde{x}_u}\big ( \bar{s} \big ). \end{aligned}$$

Therefore,

$$\begin{aligned} P\Big ( \inf _x \widehat{\widehat{F}}_x\big ( \bar{s} \big ) \,\ge \, F_{\widetilde{x}_u}\big ( \bar{s} \big )/2 \Big ) \mathop {\longrightarrow }\limits _{n\rightarrow \infty }1. \end{aligned}$$

(3.2)

Furthermore, since $\widetilde{x}<\widetilde{x}_l$ implies that $\#\big \{t\le n:\, X_{t-1}\ge \widetilde{x}_l\big \}< n^{2/3}$ we obtain by Lemma 4.2 that

$$\begin{aligned} P\big ( \widetilde{x}< \widetilde{x}_l \big )\le & {} P\Big ( \# \big \{t\le n:\, X_{t-1}\ge \widetilde{x}_l \big \} \,<\, n^{2/3} \Big ) \\\le & {} P\Big ( \# \big \{t\le n:\, X_{t-1}\ge \widetilde{x}_l \big \} \,-\, n P_X\big ( [\widetilde{x}_l,\infty ) \big ) \,<\, -(n/2) P_X\big ( [\widetilde{x}_l,\infty ) \big ) \Big )\\&\,=\,&O\big ( n^{-1} \big ). \end{aligned}$$

On the other hand, $\widetilde{x}\ge \widetilde{x}_l$ yields that $\widehat{\widehat{F}}_x(z)=\widehat{F}_x(z)$ for all $x\le \widetilde{x}_l$. Hence, we obtain, for each $z\in {\mathbb N}_0$,

$$\begin{aligned}{} & {} { E\left[ \int \big | \widehat{\widehat{F}}_x(z) \,-\, F_x(z) \big | \, dP_X(x) \right] } \nonumber \\{} & {} \quad = E\left[ \int _{\{x:\, x\le \widetilde{x}_l\}} \big | \widehat{F}_x(z) \,-\, F_x(z) \big | \, dP_X(x) \right] \,+\, P_X\big ( (\widetilde{x}_l,\infty ) \big ) \,+\, O\big ( n^{-1} \big ) \nonumber \\{} & {} \quad = O\big ( n^{-1/3} \big ). \end{aligned}$$

(3.3)

Now we are in a position to define our resampling algorithm generating the bootstrap variates:

1.
Choose a starting value $X_0^*$.
2.
For each $t\in {\mathbb N}_0$, suppose that $X_0^*,\ldots ,X_t^*$ have been generated already. Then $X_{t+1}^*$ is generated such that is has, conditioned on $X_0^*,\ldots ,X_t^*$ and conditioned on the original sample $X_0,\ldots ,X_n$, a probability distribution function $\widehat{\widehat{F}}_{X_t^*}(\cdot )$.

In what follows, the symbol $P^*$ refers to the distribution of the bootstrap variables conditioned on the original sample, e.g. $P^*\big (X_t^*\in A\big )=P\big (X_t^*\in A\mid X_0,\ldots ,X_n\big )$.

Let $K_n=\log n/(3\gamma )$. Then (A3)(iii) implies that $P(X_t> K_n)=O(n^{-1/3})$ and we obtain from (3.3) that

$$\begin{aligned}{} & {} { \sum _{y=0}^\infty \sum _{x=0}^\infty \big | P^*\big ( X^*_{t+1}=y\mid X^*_t=x \big ) \,-\, P\big ( X_{t+1}=y \mid X_t=x \big ) \big | \; P_X(\{x\}) } \nonumber \\{} & {} \quad = \sum _{y=0}^{K_n} \sum _{x=0}^\infty \big | P^*\big ( X^*_{t+1}=y\mid X^*_t=x \big ) \,-\, P\big ( X_{t+1}=y \mid X_t=x \big ) \big | \; P_X(\{x\}) \nonumber \\{} & {} \qquad +\, \sum _{x=0}^\infty \big ( P^*\big ( X^*_{t+1}>K_n\mid X^*_t=x \big ) \,+\, P\big ( X_{t+1}>K_n \mid X_t=x \big ) \big ) \; P_X(\{x\}) \nonumber \\{} & {} \quad \le O\big ( K_n \, n^{-1/3} \big ) +\, \sum _{x=0}^\infty \big | P^*\big ( X^*_{t+1}>K_n\mid X^*_t=x \big ) \,-\, P\big ( X_{t+1}>K_n \mid X_t=x \big ) \big |\nonumber \\{} & {} \qquad \; P_X(\{x\}) \,+\, 2\,P\big ( X_t>K_n \big ) \nonumber \\{} & {} \quad = O_P\big ( n^{-1/3} \log n \big ). \end{aligned}$$

(3.4)

Note that for all $t\in {\mathbb N}$ $X_t^*$ takes values in $\{X_1,\ldots ,X_n\}$, i.e. $X_t^*$ lies in the collection of x with $P_X(\{x\})>0$. Hence, it follows from (3.1) and (3.2) that a process with transition distribution functions $\widehat{\widehat{F}}_x(\cdot )$ satisfies the following Doeblin-type condition with a probability tending to 1:

$$\begin{aligned} \inf _x P^*\big ( X_{t+2}^*=y \mid X_t^*=x \big )\ge & {} \inf _x \sum _{z\in S} P^*\big ( X_{t+2}^*=y \mid X_{t+1}^*=z \big ) \\{} & {} \; P^*\big ( X_{t+1}^*=z \mid X_t^*=x \big ) \\\ge & {} \frac{\kappa \, \delta }{4} \, Q^*\big ( \{y\} \big ) \qquad \forall y\in {\mathbb N}_0. \end{aligned}$$

This implies that, with a probability tending to 1, the bootstrap process is geometrically ergodic and has a unique stationary distribution $P^*_{X^*}$.

For a successful application of the bootstrap approximation the following properties are vitally important: With a probability tending to 1, conditioned on $X_0,\ldots ,X_n$,

(a)
the stationary distribution $P^*_{X^*}$ converges to $P_X$,
(b)
the finite-dimensional distributions of $(X_t^*)_{t}$ converge to those of $(X_t)_t$.

We show these two properties by a coupling of the original process and its bootstrap counterpart, i.e. we define versions $(\widetilde{X}_t)_{t\in {\mathbb N}_0}$ and $(\widetilde{X}_t^*)_{t\in {\mathbb N}_0}$ on a common probability space $(\widetilde{\Omega },\widetilde{\mathcal A},\widetilde{P})$ such that the corresponding random variables $\widetilde{X}_t$ and $\widetilde{X}_t^*$ are equal with a high probability. We use the technique of maximal coupling [see e.g. Theorem 5.2 in Chapter I in Lindvall (1992)] and define the transition probabilities $\widetilde{\pi }$ driving the coupled process $\big ( (\widetilde{X}_t,\widetilde{X}_t^*) \big )_{t\in {\mathbb N}_0}$ as follows. For $x,y\in {\mathbb N}_0$, let $\pi (x,y)=P(X_{t+1}=y\mid X_t=x)$ and $\pi ^*(x,y)=P^*(X_{t+1}^*=y\mid X_t^*=x)$. Then

$$\begin{aligned} \delta _{x,x^*} \,=\, \frac{1}{2} \sum _{y\in {\mathbb N}_0} \big | \pi (x,y) \,-\, \pi ^*(x^*,y) \big | \end{aligned}$$

is the total variation distance between the distributions with respective probability mass functions $\pi (x,\cdot )$ and $\pi ^*(x^*,\cdot )$. Note that $\delta _{x,x^*}=\sum _y [\pi (x,y)-\min \{\pi (x,y),\pi ^*(x^*,y)\}] =\sum _y [\pi ^*(x^*,y)-\min \{\pi (x,y),\pi ^*(x^*,y)\}]$. The transition probabilities of the coupled process are defined as

$$\begin{aligned} \widetilde{\pi }\big ( (x,x^*), (y,y) \big ) \,=\, \min \big \{ \pi (x,y), \pi ^*(x^*,y) \big \} \qquad \forall x,x^*,y\in {\mathbb N}_0, \end{aligned}$$

(3.5a)

and, for $x,x^*,y,y^*\in {\mathbb N}_0$ such that $y\ne y^*$,

$$\begin{aligned}{} & {} { \widetilde{\pi }\big ( (x,x^*), (y,y^*) \big ) } \nonumber \\{} & {} \quad = \left\{ \begin{array}{ll} 0 &{} \quad \hbox { if }\quad \delta _{x,x^*}=0, \\ \frac{[\pi (x,y)-\min \{\pi (x,y),\pi ^*(x^*,y)\} ]\, [\pi ^*(x^*,y^*)-\min \{\pi (x,y^*),\pi ^*(x^*,y^*)\}] }{ \delta _{x,x^*} } &{} \quad \hbox { if }\quad \delta _{x,x^*}\ne 0. \end{array} \right. \nonumber \\ \end{aligned}$$

(3.5b)

The corresponding Markov kernel $\widetilde{P}$ is defined as

$$\begin{aligned}{} & {} \widetilde{P}\big ( (\widetilde{X}_{t+1},\widetilde{X}_{t+1}^*)\in A\mid (\widetilde{X}_t,\widetilde{X}_t^*)=(x,x^*) \big )\nonumber \\{} & {} \quad \,=\, \sum _{(y,y^*)\in A} \widetilde{\pi }\big ( (x,x^*), (y,y^*) \big ) \qquad \forall A\subseteq {\mathbb N}_0\times {\mathbb N}_0. \end{aligned}$$

Note that $[\pi (x,y)-\min \{\pi (x,y),\pi ^*(x^*,y)\} ]\, [\pi ^*(x^*,y)-\min \{\pi (x,y),\pi ^*(x^*,y)\}]=0$ for all $x,x^*,y\in {\mathbb N}_0$, which implies in case of $\delta _{x,x^*}>0$ that

$$\begin{aligned}{} & {} { \sum _{y^*:\; y^*\ne y} \frac{[\pi (x,y)-\min \{\pi (x,y),\pi ^*(x^*,y)\} ]\, [\pi ^*(x^*,y^*)-\min \{\pi (x,y^*),\pi ^*(x^*,y^*)\}] }{ \delta _{x,x^*} } } \\{} & {} \quad = [\pi (x,y)-\min \{\pi (x,y),\pi ^*(x^*,y)\} ] \; \sum _{y^*\in {\mathbb N}_0} \frac{\pi ^*(x^*,y^*)-\min \{\pi (x,y^*),\pi ^*(x^*,y^*)\} }{ \delta _{x,x^*} } \\{} & {} \quad = \pi (x,y) \,-\, \min \{\pi (x,y),\pi ^*(x^*,y)\}. \end{aligned}$$

Therefore we obtain

$$\begin{aligned}{} & {} { \widetilde{P}\big ( \widetilde{X}_{t+1}=y \mid (\widetilde{X}_t,\widetilde{X}_t^*)=(x,x^*) \big ) } \\{} & {} \quad = \sum _{y^*\in {\mathbb N}_0} \widetilde{\pi }\big ( (x,x^*), (y,y^*) \big ) \\{} & {} \quad = \left\{ \begin{array}{ll} \pi (x,y) &{} \quad \hbox { if } \quad \delta _{x,x^*}=0, \\ &{} \\ \min \{ \pi (x,y), \pi ^*(x^*,y)\} \,+\, \sum _{y^*:\; y^*\ne y}&{}\\ \frac{[\pi (x,y)-\min \{\pi (x,y),\pi ^*(x^*,y)\} ]\, [\pi ^*(x^*,y^*)-\min \{\pi (x,y^*),\pi ^*(x^*,y^*)\}] }{ \delta _{x,x^*} } &{} \quad \hbox { if } \quad \delta _{x,x^*}\ne 0 \end{array} \right. \\{} & {} \quad = \pi (x,y) \,=\, P\big ( X_{t+1}=y \mid X_t=x \big ) \end{aligned}$$

and, likewise,

$$\begin{aligned} \widetilde{P}\big ( \widetilde{X}_{t+1}^*=y^* \mid (\widetilde{X}_t,\widetilde{X}_t^*)=(x,x^*) \big ) \,=\, \pi ^*(x^*,y^*) \,=\, P^*\big ( X_{t+1}^*=y^* \mid X_t^*=x^* \big ). \end{aligned}$$

Moreover, we have that

$$\begin{aligned}{} & {} \widetilde{P}\big ( \widetilde{X}_{t+1}=\widetilde{X}_{t+1}^*\mid (\widetilde{X}_t,\widetilde{X}_t^*)=(x,x^*) \big )\,=\, \sum _{y\in {\mathbb N}_0} \min \big \{ \pi (x,y), \pi ^*(x^*,y) \big \}\nonumber \\{} & {} \quad \,=\, 1 \,-\, \delta _{x,x^*}. \end{aligned}$$

(3.6)

Hence, the conditional probability that the two random variables at time $t+1$ are equal is maximized which explains the usage of the term maximal coupling.

The following theorem summarizes the results of our coupling approach. The stationary distribution $P^*_{X^*}$ of the bootstrap process converges in total variation norm and in probability to that of the original process. With a probability tending to 1, the coupled process $\big ((\widetilde{X}_t,\widetilde{X}_t^*)\big )_{t\in {\mathbb N}_0}$ is geometrically $\phi $-mixing. And finally, the corresponding values $\widetilde{X}_t^0$ and $\widetilde{X}_t^{*,0}$ of a stationary version of the coupled process coincide with a probability converging to 1.

Theorem 3.1

Suppose that (A1) and (A3) are fulfilled. Then

(i)
$d_{TV}\big ( P^*_{X^*}, P_X \big ) \,=\, O_{\widetilde{P}}\big ( n^{-1/3} \, (\log n)^2 \big )$.
(ii)
With a probability tending to 1, the process $\big ((\widetilde{X}_t,\widetilde{X}_t^*)\big )_{t\in {\mathbb N}_0}$ is $\phi $-mixing with coefficients $\phi _{\widetilde{X},\widetilde{X}^*}(k)$ decaying at a geometric rate.
(iii)
If $\big ((\widetilde{X}_t^0, \widetilde{X}_t^{*,0})\big )_{t\in {\mathbb N}_0}$ is a stationary version of the coupled process, then
$$\begin{aligned} \widetilde{P}\big ( \widetilde{X}_t^0 \ne \widetilde{X}_t^{*,0} \big ) \,=\, O_{\widetilde{P}}\big ( n^{-1/3} (\log n)^2 \big ). \end{aligned}$$

These general results can be used to prove bootstrap consistency for specific statistics. Suppose that $X_0,\ldots ,X_n$ are observed and that (A1) and (A3) are fulfilled. To illustrate our advocated approach, we consider e.g. the parameter $\theta :=P(X_{t-1}=x,X_t=y)$, which is consistently estimated by $\widehat{\theta }_n=n^{-1}\sum _{t=1}^n {\mathbb {1}}(X_{t-1}=x,X_t=y)$. It follows from a central limit theorem for $\phi $-mixing processes (see e.g. Theorem 15.12 in Bradley (2007b)) that

$$\begin{aligned} S_n:=\, \sqrt{n} \big ( \widehat{\theta }_n - \theta \big ) \,{\mathop {\longrightarrow }\limits ^{d}}\, Y \sim {\mathcal N}\left( 0, \sigma _\infty ^2\right) , \end{aligned}$$

where $\sigma _\infty ^2=\sum _{k=-\infty }^\infty \mathop {\textrm{cov}}\nolimits \big ( {\mathbb {1}}(X_0=x,X_1=y), {\mathbb {1}}(X_{|k|}=x,X_{|k|+1}=y) \big )$. The distribution of $S_n$ can be approximated by that of its bootstrap counterpart,

$$\begin{aligned} S_n^*:=\, \sqrt{n} \big ( \widehat{\theta }_n^* - E^*\widehat{\theta }_n^* \big ), \end{aligned}$$

where $\widehat{\theta }_n^*=n^{-1}\sum _{t=1}^n {\mathbb {1}}(X_{t-1}^*=x,X_t^*=y)$ and $E^*\widehat{\theta }_n^*=E\big (\widehat{\theta }_n^*\,\big |\,X_0,\ldots ,X_n)$. In order to prove that the distribution of $S_n^*$ converges in probability to the same limit as that of $S_n$, we could use a central limit theorem for triangular arrays of $\phi $-mixing random variables. Alternatively, we can use our coupling results and obtain bootstrap consistency almost for free. It follows from (iii) of Theorem 3.1 that

$$\begin{aligned} \widetilde{P}\Big ( (\widetilde{X}_{t-1},\widetilde{X}_t)\ne (\widetilde{X}_{t-1}^*,\widetilde{X}_t^*) \big ) \,=\, O_{\widetilde{P}}\big ( n^{-1/3} \, (\log n)^2\big ). \end{aligned}$$

Using this and a covariance inequality for $\phi $-mixing random variables [see e.g. Theorem 1.2.2.3 in Doukhan (1994, p. 9)] we obtain

$$\begin{aligned}{} & {} { \big | \mathop {\textrm{cov}}\nolimits \big ( {\mathbb {1}}(\widetilde{X}_{s-1},\widetilde{X}_s) - {\mathbb {1}}(\widetilde{X}_{s-1}^*,\widetilde{X}_s^*), {\mathbb {1}}(\widetilde{X}_{t-1},\widetilde{X}_t) - {\mathbb {1}}(\widetilde{X}_{t-1}^*,\widetilde{X}_t^*) \big ) \big | } \\{} & {} \quad \le 2\, \phi \big ( |s-t|-1 \big ) \, \big \Vert {\mathbb {1}}(\widetilde{X}_{s-1},\widetilde{X}_s) - {\mathbb {1}}(\widetilde{X}_{s-1}^*,\widetilde{X}_s^*) \big \Vert _1 \, \big \Vert {\mathbb {1}}(\widetilde{X}_{t-1},\widetilde{X}_t) - {\mathbb {1}}(\widetilde{X}_{t-1}^*,\widetilde{X}_t^*) \big \Vert _\infty \\{} & {} \quad = O_{\widetilde{P}} \big ( \phi _{\widetilde{X},\widetilde{X}^*}(|s-t|-1 \big ) \, n^{-1/3} \, (\log n)^2 \big ), \end{aligned}$$

which implies that

$$\begin{aligned}{} & {} { \widetilde{E} \Big | \big ( \widetilde{S}_n \,-\, \widetilde{S}_n^* \big )^2 \Big ] } \\{} & {} \quad = \frac{1}{n} \sum _{s,t=1}^n \mathop {\textrm{cov}}\nolimits \big ( {\mathbb {1}}(\widetilde{X}_{s-1},\widetilde{X}_s) - {\mathbb {1}}(\widetilde{X}_{s-1}^*,\widetilde{X}_s^*), {\mathbb {1}}(\widetilde{X}_{t-1},\widetilde{X}_t) - {\mathbb {1}}(\widetilde{X}_{t-1}^*,\widetilde{X}_t^*) \big ) \\{} & {} \quad = O_{\widetilde{P}} \big ( \phi _{\widetilde{X},\widetilde{X}^*}(|s-t|-1 ) \, n^{-1/3} \, (\log n)^2 \big ). \end{aligned}$$

This implies that

$$\begin{aligned} S_n^* \,{\mathop {\longrightarrow }\limits ^{d}}\, Y \qquad \hbox {in probability}. \end{aligned}$$

If in addition $\sigma _\infty ^2>0$, then we obtain by Lemma 2.11 of van der Vaart (1998) that

$$\begin{aligned} \sup _x \big | P\big ( S_n\le x \big ) \,-\, P\big ( S_n^*\le x \,\big |\, X_0,\ldots ,X_n \big ) \big | \,{\mathop {\longrightarrow }\limits ^{P}}\, 0. \end{aligned}$$

Hence, we can use bootstrap quantiles to construct confidence intervals for $\theta $ such that their coverage probability converges to a prescribed level. Similar implications for other types of statistics are discussed in Leucht and Neumann (2013) and Neumann (2021).

Remark 1

In a similar context, Paparoditis and Politis (2002, Theorem 3.3) proved almost sure convergence of the bootstrap stationary distribution to the stationary distribution of the original process. Their method of proof is completely different from ours and employs classical tools from the theory of weak convergence such as Helly’s theorem and the “uniqueness trick” which uses the fact that each subsequence contains a further subsequence converging to the same probability measure. We use a more direct approach based on a coupling of the original and the bootstrap process. The additional benefit is that we obtain a rate of convergence rather than consistency only.

The following pictures give an impression of the effect of our coupling. As done for the pictures displayed in the previous section, we simulated a Poisson-INARCH process of order 1, where $X_t\mid X_{t-1},X_{t-2},\ldots \sim \hbox {Poisson}\big ( f(X_{t-1}) \big )$ and $f(x)=\min \big \{\alpha _0+\alpha _1 x, \beta \big \}$. The parameters $\alpha _0$ and $\alpha _1$ are chosen as 2.0 and 0.5, respectively, and the truncation constant $\beta $ is set to 6.0. For respective sample sizes of $n=200$ and $n=1000$, Figs. 1 and 2 show one realization of independent and coupled versions of $X_1,\ldots ,X_{50}$ and $X_1^*,\ldots ,X_{50}^*$. While the pictures on the left of Figs. 1 and 2 let us at best hope for a similar behavior of the bootstrap and the original process, those on the right provide some evidence that the bootstrap process successfully mimics the behavior of the original process.

4 Proofs

4.1 Proofs of the main results

Proof of Theorem 2.1

Our strategy to prove this result is already sketched in Sect. 2, in the special case where the distribution function $F_X$ associated to $P_X$ is continuous. In the general case with a possibly discontinuous function $F_X$, we have to take great care since we cannot split the domain D into intervals $I_k$ such that $P_X(I_k)=1/k_n$, where $k_n=\lfloor n^{1/3}\rfloor $. It could be the case that $P_X$ has masses considerably larger than $1/k_n$ at single points which requires a modification of our previous approach.

To obtain an appropriate collection of intervals $I_k$, we define again suitable grid points $x_0,x_n,\ldots ,x_{K_n}$. For technical reasons we choose them as a decreasing sequence. We set $x_0:=\infty $ and define recursively $x_k:=\inf \{x:\, P_X((x,x_{k-1}))\le n^{-1/3}\}$ for $k\ge 1$. This procedure will terminate when $x_{K_n}=0$ and $D={\mathbb R}$ or when $x_{K_n}=-\infty $, for some $K_n$. In both cases we have that $D=[x_1,x_0)\cup \cdots \cup [x_{K_n},x_{K_n-1})$. For $k=1,\ldots ,K_n-1$, i.e. with a possible exception of $k=K_n$, we have

$$\begin{aligned} P_X\big ( (x_k,x_{k-1}) \big ) \,\le \, n^{-1/3} \,\le \, \lim _{m\rightarrow \infty } P_X\big ( (x_k-1/m,x_{k-1}) \big ) \,=\, P_X\big ( [x_k,x_{k-1}) \big ), \end{aligned}$$

where the latter equality follows since the probability measure $P_X$ is continuous from above. In the following we show that

$$\begin{aligned} E\left[ \int _D \big (\widehat{F}_x(z)-F_x(z)\big )^+ \,dP_X(x) \right] \,=\, O\big ( n^{-1/3} \big ). \end{aligned}$$

(4.1)

To this end, we consider the contributions by $E\big [ \int _{[x_k,x_{k-1})} (\widehat{F}_x(z)-F_x(z))^+\,dP_X(x)\big ]$ separately. We distinguish between three possible cases.

Case 1 If $P_X\big ([x_k,x_{k-1})\big )\le 2n^{-1/3}$ and $k<K_n$, then we use for all $x\in [x_k,x_{k-1})$ in case of $N_{n,k}:=\big \{t\le n:\, X_{t-1}\in [x_k,x_{k-1})\big \}\ne 0$ the estimate

$$\begin{aligned} \big ( \widehat{F}_x(z) \,-\, F_x(z) \big )^+\le & {} \big ( \widehat{F}_{x_k}(z) \,-\, F_{x_{k-1}}(z) \big )^+ \\\le & {} \max _{v:\, v\ge x_k} \left\{ \frac{\big |\sum _{t=1}^n [{\mathbb {1}}(X_t\le z) \,-\, F_{X_{t-1}}(z)]\, {\mathbb {1}}(X_{t-1}\in [x_{k+1},v]) \big |}{ \#\{t\le n:X_{t-1}\in [x_{k+1},v]\} \vee 1 } \right\} \\{} & {} {} \,+\, \big [ F_{x_{k+1}}(z) \,-\, F_{x_{k-1}}(z) \big ], \end{aligned}$$

which leads to

$$\begin{aligned}{} & {} { E\left[ \int _{[x_k,x_{k-1})} \big ( \widehat{F}_x(z) \,-\, F_x(z) \big )^+ \, dP_X(x)\, {\mathbb {1}}_{\{N_{n,k}\ne 0\}} \right] } \nonumber \\{} & {} \quad \le P_X\big ( [x_k,x_{k-1}) \big ) \;\nonumber \\{} & {} \qquad \left\{ E\left[ \max _{v:\, v\ge x_k} \left\{ \frac{\big |\sum _{t=1}^n [{\mathbb {1}}(X_t\le z) \,-\, F_{X_{t-1}}(z)] {\mathbb {1}}(X_{t-1}\in [x_{k+1},v]) \big |}{ \#\{t\le n:X_{t-1}\in [x_{k+1},v]\} \vee 1 } \right\} \right] \right. \nonumber \\{} & {} \qquad \left. +\, \big [ F_{x_{k+1}}(z) \,-\, F_{x_{k-1}}(z) \big ] \right\} \nonumber \\{} & {} \quad = O\Big ( P_X\big ( [x_k,x_{k-1}) \big ) \; \big \{ n^{-1/3} \,+\, (F_{x_{k+1}}(z) - F_{x_{k-1}}(z)) \big \} \Big ). \end{aligned}$$

(4.2a)

Case 2 If $P_X\big ([x_k,x_{k-1})\big )> 2n^{-1/3}$ then $P_X$ has at $x_k$ a point mass greater than $n^{-1/3}$ and we argue differently. In this case, we use for all $x\in (x_k,x_{k-1})$ in case of $N_{n,k}:=\big \{t\le n:\, X_{t-1}=x_k\big \}\ne 0$ the estimate

$$\begin{aligned} \big ( \widehat{F}_x(z) \,-\, F_x(z) \big )^+\le & {} \max _{v:\, v\ge x_k} \left\{ \frac{\big |\sum _{t=1}^n [{\mathbb {1}}(X_t\le z) \,-\, F_{X_{t-1}}(z)] \, {\mathbb {1}}(X_{t-1}\in [x_k,v]) \big |}{ \#\{t\le n:\, X_{t-1}\in [x_k,v]\} \vee 1 } \right\} \\{} & {} {} \,+\, \big [ F_{x_k}(z) \,-\, F_{x_{k-1}}(z) \big ], \end{aligned}$$

which implies

$$\begin{aligned}{} & {} { E\left[ \int _{(x_k,x_{k-1})} \big ( \widehat{F}_x(z) \,-\, F_x(z) \big )^+ \, dP_X(x)\, {\mathbb {1}}_{\{N_{n,k}\ne 0\}} \right] } \nonumber \\{} & {} \quad \le P_X\big ( (x_k,x_{k-1}) \big ) \;\nonumber \\{} & {} \qquad \left\{ E\left[ \max _{v:\, v\ge x_k} \left\{ \frac{\big |\sum _{t=1}^n [{\mathbb {1}}(X_t\le z) \,-\, F_{X_{t-1}}(z)] \, {\mathbb {1}}(X_{t-1}\in [x_k,v]) \big |}{ \#\{t\le n:\, X_{t-1}\in [x_k,v]\} \vee 1 } \right\} \right] \right. \nonumber \\{} & {} \qquad \left. +\, \big [ F_{x_k}(z) \,-\, F_{x_{k-1}}(z) \big ] \right\} \nonumber \\{} & {} \quad = O\Big ( P_X\big ( (x_k,x_{k-1}) \big ) \; \big \{ n^{-1/3} \,+\, (F_{x_k}(z) - F_{x_{k-1}}(z)) \big \} \Big ). \end{aligned}$$

(4.2b)

For $x=x_k$, we use the simpler estimate

$$\begin{aligned} \big ( \widehat{F}_x(z) \,-\, F_x(z) \big )^+\le & {} \max _{v:\, v>x_k} \left\{ \frac{\big |\sum _{t=1}^n [{\mathbb {1}}(X_t\le z) \,-\, F_{X_{t-1}}(z)] \, {\mathbb {1}}(X_{t-1}\in [x_k,v]) \big |}{ \#\{t\le n:\, X_{t-1}\in [x_k,v]\} \vee 1 } \right\} , \end{aligned}$$

and we obtain

$$\begin{aligned}{} & {} { E\left[ \int _{ \{x_k\} } \big ( \widehat{F}_x(z) \,-\, F_x(z) \big )^+ \, dP_X(x) \,{\mathbb {1}}_{\{N_{n,k}\ne 0\}} \right] } \nonumber \\{} & {} \quad \le P_X\big ( \{x_k\} \big ) \; E\left[ \max _{v:\, v>x_k} \left\{ \frac{\big |\sum _{t=1}^n [{\mathbb {1}}(X_t\le z) \,-\, F_{X_{t-1}}(z)] {\mathbb {1}}(X_{t-1}\in [x_k,v]) \big |}{ \#\{t\le n:\, X_{t-1}\in [x_k,v]\} \vee 1 } \right\} \; {\mathbb {1}}_{A_n} \right] \nonumber \\{} & {} \quad = O\Big ( P_X\big ( \{x_k\} \big ) \; n^{-1/3} \Big ). \end{aligned}$$

(4.2c)

Case 3 If $P_X([x_{K_n},x_{K_n-1}))\le 2n^{-1/3}$, then we can simply use the estimate

$$\begin{aligned} E\Bigg [ \int _{[x_{K_n},x_{K_n-1})} \big ( \widehat{F}_x(z) \,-\, F_x(z) \big )^+ \, dP_X(x) \,\le \, 2\, n^{-1/3}. \end{aligned}$$

(4.2d)

Finally, it follows from Lemma 4.2 that $P\big ( \bigcup _k \{\omega :N_{n,k}(\omega )=0 \} \big )=O(n^{-1/3})$, which implies that

$$\begin{aligned}{} & {} E\left[ \int _D \big (\widehat{F}_x(z)-F_x(z)\big )^+ \,dP_X(x) \; {\mathbb {1}}_{\bigcup _k \{N_{n,k}=0\}} \right] \nonumber \\{} & {} \qquad \,\le \, P\left( \bigcup _k \{\omega :N_{n,k}(\omega )=0 \} \right) \,=\, O\big ( n^{-1/3} \big ). \end{aligned}$$

(4.2e)

From (4.2a) to (4.2e) we obtain (4.1).

The term $\int _D (\widehat{F}_x(z)-F_x(z))^-\,dP_X(x)$ can be analogously estimated which completes the proof of the theorem. $\square $

Proof of Theorem 3.1

(i)
We construct a coupling of the original process and its bootstrap counterpart, where we use $\widetilde{\pi }\big ((x,x^*),(y,y^*)\big )$ defined by (3.5a) and (3.5b) as transition probabilities and $\widetilde{P}$ as transition kernel. The initial values are chosen such that $\widetilde{X}_0=\widetilde{X}^*_0 \sim P_X$. Then, for each $t\in {\mathbb N}_0$, conditioned on $(\widetilde{X}_t,\widetilde{X}^*_t)$, the next pair $(\widetilde{X}_{t+1},\widetilde{X}^*_{t+1})$ is generated according to $\widetilde{P}$. It follows from (3.4) and (3.6) in particular that
$$\begin{aligned}{} & {} \widetilde{P}\big ( \widetilde{X}_{t+1} \ne \widetilde{X}^*_{t+1}, \, \widetilde{X}_t=\widetilde{X}^*_t \big )\\{} & {} \quad = \sum _{x\in {\mathbb N}_0} \widetilde{P}\big ( \widetilde{X}_{t+1}\ne \widetilde{X}_{t+1}^*\mid \widetilde{X}_t=\widetilde{X}_t^*=x \big ) \, \widetilde{P}\big ( \widetilde{X}_t=\widetilde{X}_t^*=x \big ) \\{} & {} \quad = \sum _{x\in {\mathbb N}_0} \delta _{x,x} \, \widetilde{P}\big ( \widetilde{X}_t=\widetilde{X}_t^*=x \big ) \\{} & {} \quad \le \frac{1}{2} \, \sum _x \sum _y \big | \pi (x,y) \,-\, \pi ^*(x,y) \big | \, P_X(\{x\})\\{} & {} \quad = O_{\widetilde{P}}\big ( n^{-1/3} \, \log n \big ). \end{aligned}$$
This implies first
$$\begin{aligned} \widetilde{P}\big ( \widetilde{X}_1\ne \widetilde{X}_1^* \big ) \,=\, O_{\widetilde{P}}\big ( n^{-1/3} \, \log n \big ), \end{aligned}$$
then
$$\begin{aligned}{} & {} \widetilde{P}\big ( \widetilde{X}_2\ne \widetilde{X}_2^* \big ) \,\le \, \widetilde{P}\big ( \widetilde{X}_2\ne \widetilde{X}_2^*,\, \widetilde{X}_1=\widetilde{X}_1^* \big ) \,+\, \widetilde{P}\big ( \widetilde{X}_1\ne \widetilde{X}_1^* \big )\\{} & {} \quad = O_{\widetilde{P}}\big ( n^{-1/3} \, \log n \big ), \end{aligned}$$
and after $K_n$ such steps
$$\begin{aligned} d_{TV}\big ( P_X, \widetilde{P}^{\widetilde{X}_{K_n}^*} \big ) \,\le \, \widetilde{P}\big ( \widetilde{X}_{K_n}\ne \widetilde{X}_{K_n}^* \big ) \,=\, O_{\widetilde{P}}\big ( n^{-1/3} \, \log n \, K_n\big ). \end{aligned}$$
On the other hand, $(X_t^*)_{t\in {\mathbb N}_0}$, and therefore $(\widetilde{X}_t^*)_{t\in {\mathbb N}_0}$ as well, are geometrically ergodic. Hence, for $K_n=K \log n$ and K sufficiently large,
$$\begin{aligned} d_{TV}\left( \widetilde{P}^{\widetilde{X}_{K_n}^*}, P^*_{X^*} \right) \,=\, O_{\widetilde{P}}\big ( n^{-1/3} \big ), \end{aligned}$$
which leads to
$$\begin{aligned} d_{TV}\big ( P_X, P^*_{X^*} \big )\le & {} d_{TV}\left( P_X, \widetilde{P}^{\widetilde{X}_{K_n}^*} \right) \,+\, d_{TV}\left( \widetilde{P}^{\widetilde{X}_{K_n}^*}, P^*_{X^*} \right) \\= & {} O_{\widetilde{P}}\big ( n^{-1/3} \, (\log n)^2 \big ). \end{aligned}$$
(ii)
We couple the original and the bootstrap process according to (3.5a) and (3.5b) and show first that
$$\begin{aligned}{} & {} \widetilde{P} \big ( (\widetilde{X}_t,\widetilde{X}_t^*)\in S\times S \,\big |\, \widetilde{X}_{t-1}=x, \widetilde{X}_{t-1}^*=x^* \big )\nonumber \\{} & {} \quad \,\ge \, P\big ( X_t\in S \,\big |\, X_{t-1}=x \big ) \cdot P^*\big ( X_t^*\in S \,\big |\, X_{t-1}^*=x^* \big ) \end{aligned}$$
(4.3)
holds for all $x,x^*\in {\mathbb N}_0$. Let $x,x^*\in {\mathbb N}_0$ be arbitrary. To simplify notation we set, for a generic set $B\subseteq {\mathbb N}_0$, $\pi (B)=\sum _{y\in B}\pi (x,y)$, $\pi ^*(B)=\sum _{y\in B}\pi ^*(x^*,y)$, and $\pi \wedge \pi ^*(B)=\sum _{y\in B}\pi (x,y)\wedge \pi ^*(x^*,y)$. If $\pi \wedge \pi ^*(S)\ge \pi (S)\cdot \pi ^*(S)$, then (4.3) follows immediately. Suppose now the opposite, $\pi \wedge \pi ^*(S)<\pi (S)\cdot \pi ^*(S)$. Then $\delta _{x,x^*}>0$, and it follows from (3.5a) and (3.5b)
$$\begin{aligned}{} & {} { \widetilde{P} \big ( (\widetilde{X}_t,\widetilde{X}_t^*)\in S\times S \,\big |\, \widetilde{X}_{t-1}=x, \widetilde{X}_{t-1}^*=x^* \big ) } \\{} & {} \quad = \sum _{y\in S} \pi (x,y)\wedge \pi ^*(x^*,y), \\{} & {} \qquad +\, \sum _{y,y^*\in S} \frac{ \big ( \pi (x,y) - \pi (x,y)\wedge \pi ^*(x^*,y) \big ) \, \big ( \pi ^*(x^*,y^*) - \pi (x,y^*)\wedge \pi ^*(x^*,y^*) \big ) }{ \delta _{x,x^*} } \\{} & {} \quad = \pi \wedge \pi ^*(S) \,+\, \frac{ 1 }{ \delta _{x,x^*} } \big ( \pi (S) - \pi \wedge \pi ^*(S) \big ) \big ( \pi ^*(S) - \pi \wedge \pi ^*(S) \big ). \\{} & {} \quad = \pi (S) \cdot \pi ^*(S) +\, \frac{ 1 }{ \delta _{x,x*} } \Big \{ \delta _{x,x^*} \big ( \pi \wedge \pi ^*(S) \,-\, \pi (S) \pi ^*(S) \big )\\{} & {} \qquad +\, \big ( \pi (S) - \pi \wedge \pi ^*(S) \big ) \big ( \pi ^*(S) - \pi \wedge \pi ^*(S) \big ) \Big \}. \end{aligned}$$
Since $\delta _{x,x^*}\,=\,1-\pi \wedge \pi ^*({\mathbb N}_0)\,=\,\big (\pi (S)-\pi \wedge \pi ^*(S)\big )+\big (\pi (S^c)-\pi \wedge \pi ^*(S^c)\big )$ the term in curly braces is equal to
$$\begin{aligned}{} & {} \big (\pi (S)-\pi \wedge \pi ^*(S)\big ) \, \big ( \pi \wedge \pi ^*(S) \,-\, \pi (S) \pi ^*(S) \big )\\{} & {} \qquad \,+\, \big (\pi (S^c)-\pi \wedge \pi ^*(S^c)\big ) \, \big ( \pi \wedge \pi ^*(S) \,-\, \pi (S) \pi ^*(S) \big ) \\{} & {} \qquad {} \,+\, \big (\pi (S)-\pi \wedge \pi ^*(S)\big ) \, \big (\pi ^*(S)-\pi \wedge \pi ^*(S)\big ) \\{} & {} \quad = \big (\pi (S)-\pi \wedge \pi ^*(S)\big ) \, \pi ^*(S) \, \pi (S^c) \\{} & {} \qquad {} \,+\, \big (\pi (S^c)-\pi \wedge \pi ^*(S^c)\big ) \, \big ( \pi \wedge \pi ^*(S) \,-\, \pi (S) \pi ^*(S) \big ) \\{} & {} \quad = \pi (S^c) \, \big ( \pi \wedge \pi ^*(S) \,-\, \pi \wedge \pi ^*(S) \, \pi ^*(S) \big )\\{} & {} \qquad \,+\, \pi \wedge \pi ^*(S^c) \, \big ( \pi (S) \pi ^*(S) \,-\, \pi \wedge \pi ^*(S) \big ), \end{aligned}$$
and is therefore non-negative. This proves (4.3). It follows from (3.4) that, for $y,y^*\in S$ such that $P_X(\{y^*\})>0$,
$$\begin{aligned} \widetilde{P}\big ( \widetilde{X}_{t+1}= & {} \widetilde{X}_{t+1}^*=z \,\big |\, \widetilde{X}_t=y, \widetilde{X}_t^*=y^* \big ) = \pi (y,z)\wedge \pi ^*(y^*,z) \nonumber \\\ge & {} \kappa \, Q\big ( \{z\} \big ) \,+\, O_P\big ( n^{-1/3} \, \log n \big ). \end{aligned}$$
(4.4)
We obtain from (4.3) and (4.4) there exist some $\kappa ^*>0$ such that
$$\begin{aligned}{} & {} { \widetilde{P}\big ( (\widetilde{X}_{t+1},\widetilde{X}_{t+1}^*)=(z,z) \,\big |\, \widetilde{X}_{t-1}=x, \widetilde{X}_{t-1}^*=x^* \big ) } \nonumber \\{} & {} \quad \ge \sum _{y,y^*\in S} \widetilde{P}\big ( (\widetilde{X}_{t+1},\widetilde{X}_{t+1}^*)=(z,z) \,\big |\, \widetilde{X}_t=y, \widetilde{X}_t^*=y^* \big ) \,\nonumber \\{} & {} \qquad \widetilde{P}\big ( (\widetilde{X}_t,\widetilde{X}_t^*)=(y,y^*) \,\big |\, \widetilde{X}_{t-1}=x, \widetilde{X}_{t-1}^*=x^* \big ) \nonumber \\{} & {} \quad \ge \kappa ^* \end{aligned}$$
(4.5)
holds with a probability tending to 1. Hence, with a probability tending to 1, the coupled process is $\phi $-mixing with geometrically decaying coefficients.
(iii)
According to (4.5), the coupled process $\big ((\widetilde{X}_t,\widetilde{X}_t^*)\big )_{t\in {\mathbb N}_0}$ satisfies Doeblin’s condition which implies in particular that this process has a unique stationary distribution. Let $\big ((\widetilde{X}_t^0,\widetilde{X}_t^{*,0})\big )_{t\in {\mathbb N}_0}$ be a stationary version of the coupled process. Since $\big ((\widetilde{X}_t,\widetilde{X}_t^*)\big )_{t\in {\mathbb N}_0}$ is geometrically ergodic we obtain

$$\begin{aligned} \widetilde{P}\big ( \widetilde{X}_t^0 \ne \widetilde{X}_t^{*,0} \big )\le & {} \widetilde{P}\big ( \widetilde{X}_{K_n} \ne \widetilde{X}_{K_n}^* \big ) \,+\, d_{TV}\big ( \widetilde{P}^{(\widetilde{X}_{K_n}^0,\widetilde{X}_{K_n}^{*,0})}, \widetilde{P}^{(\widetilde{X}_{K_n},\widetilde{X}_{K_n}^*)} \big ) \\= & {} O_{\widetilde{P}}\big ( n^{-1/3} \, (\log n)^2 \big ). \end{aligned}$$

$\square $

4.2 Some auxiliary lemmas

Lemma 4.1

Suppose that $(X_t)_{t\in {\mathbb N}_0}$ is a Markov chain with state space $D\subseteq {\mathbb R}$ such that (A2) is fulfilled. For arbitrary $I\subseteq D$, let

$$\begin{aligned} \eta _t:=\, \big [ {\mathbb {1}}(X_t\le z) \,-\, P(X_t\le z\mid X_{t-1}) \big ] \; {\mathbb {1}}(X_{t-1}\in I), \end{aligned}$$

where $I\subseteq D$. Then, for arbitrary $\gamma <1$,

$$\begin{aligned} E\left[ \left( \sum _{t=1}^n \eta _t \right) ^4 \right] \,=\, O\big ( (n\, p_I)^2 \,+\, n\, p_I^\gamma \big ), \end{aligned}$$

where $p_I:=P(X_0\in I)$.

Proof

In view of $E\big [\big (\sum _{t=1}^n \eta _t\big )^4\big ]=\sum _{s,t,u,v=1}^n E[\eta _s \eta _t \eta _u \eta _v]$ we first consider the terms $E[\eta _s \eta _t \eta _u \eta _v]$. Let the indices be chronologically ordered, i.e. $1\le s\le t\le u\le v\le n$. Then it follows from the Markov property that

$$\begin{aligned} E\big [ \eta _s \eta _t \eta _u \eta _v \big ] \,=\, 0 \quad \hbox { if }\quad u<v. \end{aligned}$$

Considering the remaining cases of $s\le t\le u=v$, we make use of the following equalities.

(a)
$s=t=u=v$ Then $E[\eta _s \eta _t \eta _u \eta _v] \,=\, E\big [\eta _s^4\big ]$.
(b)
$s=t<u=v$ Then $E[\eta _s \eta _t \eta _u \eta _v] \,=\, \mathop {\textrm{cov}}\nolimits (\eta _s^2, \eta _u^2) \,+\, E\big [\eta _s^2\big ]\,E\big [\eta _u^2\big ]$.
(c)
$s<t\le u=v$ Then $E[\eta _s \eta _t \eta _u \eta _v] \,=\, \mathop {\textrm{cov}}\nolimits (\eta _s, \eta _t \eta _u^2) \,=\, \mathop {\textrm{cov}}\nolimits (\eta _s \eta _t, \eta _u^2)$.

For $s<u$, there exist ${4 \atopwithdelims ()2}=6$ quadrupels $(t_1,t_2,t_3,t_4)$ such that $t_i=t_j=s$ for some $i\ne j$, and $t_k=t_l=u$ for some $k\ne l$. For $s<t<u$, there exist $4\cdot 3=12$ quadrupels $(t_1,t_2,t_3,t_4)$ such that $t_i=s$, $t_j=t$ and $t_k=t_l=u$ for some i, j, k, l, $k\ne l$. Finally, for $s<t=u$, there exist 4 quadrupels $(t_1,t_2,t_3,t_4)$ such that $t_i=s$ for some i and $t_j=u$ for $j\ne i$. Therefore we obtain

$$\begin{aligned} E\left[ \left( \sum _{t=1}^n \eta _t \right) ^4 \right]\le & {} \sum _{t=1}^n E\big [ \eta _t^4 \big ] + 6\, \sum _{1\le s<u\le n} E\big [ \eta _s^2 \big ] \, E\big [ \eta _u^2 \big ] \nonumber \\{} & {} {} \,+\, 12\, \sum _{r=1}^{n-1} \sum _{(s,t,u)\in {\mathcal T}_{n,r}^{(1)}} \big | \mathop {\textrm{cov}}\nolimits ( \eta _s, \eta _t \eta _u^2) \big | \nonumber \\{} & {} {} \,+\, 12\, \sum _{r=1}^{n-1} \sum _{(s,t,u)\in {\mathcal T}_{n,r}^{(2)}} \big | \mathop {\textrm{cov}}\nolimits ( \eta _s \eta _t, \eta _u^2) \big |, \end{aligned}$$

(4.6)

where

$$\begin{aligned} {\mathcal T}_{n,r}^{(1)}= & {} \big \{ (s,t,u):\; 1\le s<t\le u\le n, \; r:=t-s\ge u-t\big \} \\ {\mathcal T}_{n,r}^{(2)}= & {} \big \{ (s,t,u):\; 1\le s\le t<u\le n, \; r:=u-t> t-s\big \}. \end{aligned}$$

To estimate the last two terms on the right-hand side of (4.6) we use a well-known covariance inequality for $\alpha $-mixing random variables,

$$\begin{aligned} \big | \mathop {\textrm{cov}}\nolimits (X,Y) \big | \,\le \, 4\, \big [ \alpha (\sigma (X),\sigma (Y)) \big ]^{1-1/\alpha -1/\beta } \,\Vert X\Vert _\alpha \,\Vert Y\Vert _\beta , \end{aligned}$$

where $\alpha ,\beta \in (1,\infty )$ are such that $1/\alpha +1/\beta <1$ and $\Vert X\Vert _\alpha <\infty $, $\Vert Y\Vert _\beta <\infty $; see e.g. Bradley (2007a, Corollary 10.16). Choosing $\alpha =\beta =2/\gamma $ and taking into account that $|\eta _s|\le 1$ and $E|\eta _s|\le p_I$ we obtain that

$$\begin{aligned} \big | \mathop {\textrm{cov}}\nolimits ( \eta _s, \eta _t \eta _u^2 ) \big |\le & {} \big [ \alpha _X(t-s-1) \big ]^{1-\gamma /2-\gamma /2} \, \Vert \eta _s \Vert _{2/\gamma } \, \Vert \eta _t \eta _u^2 \Vert _{2/\gamma } \\\le & {} \big [ \alpha _X(t-s-1) \big ]^{1-\gamma } \, p_I^\gamma \end{aligned}$$

as well as

$$\begin{aligned} \big | \mathop {\textrm{cov}}\nolimits ( \eta _s \eta _t, \eta _u^2 ) \big | \,\le \, \big [ \alpha _X(u-t-1) \big ]^{1-\gamma } \, p_I^\gamma . \end{aligned}$$

Using $\#{\mathcal T}_{n,r}^{(1)}\le n(r+1)$ and $\#{\mathcal T}_{n,r}^{(2)}\le nr$ we obtain from (4.6)

$$\begin{aligned} E\left[ \left( \sum _{t=1}^n \eta _t \right) ^4 \right]\le & {} n\, p_I \,+\, 6\, (n\, p_I)^2 \\{} & {} \,+\, 12\, n\, \sum _{r=1}^{n-1} (2r+1) \big [ \alpha _X(r-1) \big ]^{1-\gamma } \, p_I^\gamma \\= & {} O\Big ( (n\, p_I)^2 \,+\, n\, p_I^\gamma \Big ), \end{aligned}$$

which completes the proof. $\square $

Lemma 4.2

Suppose that $(X_t)_{t\in {\mathbb N}_0}$ is a Markov chain with state space $D\subseteq {\mathbb R}$ and stationary distribution $P_X$ such that (A2) is fulfilled. For arbitrary $I\subseteq D$, let $N_n(I):=\#\{t\le n:\, X_{t-1}\in I\}$. Then, for arbitrary $\delta >0$, $\kappa <\infty $, and $P_X(I)\ge n^{\delta -1}$,

$$\begin{aligned} P\big ( |N_n(I) \,-\, n\,P_X(I)| \,>\, n\, P_X(I)/2 \big ) \,=\, O\big ( n^{-\kappa } \big ). \end{aligned}$$

Proof

Let $q\in 2{\mathbb N}$ and $\epsilon >0$. Since

$$\begin{aligned} \sum _{r=1}^\infty r^{q-2} \, [\alpha _X(r)]^{\epsilon /(q+\epsilon )} \,<\, \infty \end{aligned}$$

it follows from an extension of Rosenthal’s inequality (see e.g. Theorem 2 in Section 1.4.1 in Doukhan (1994)) that

$$\begin{aligned} E\Big [ \big | N_n(I) \,-\, n\, P_X(I) \big |^q \Big ]= & {} E\left[ \Big | \sum _{t=1}^n \big ({\mathbb {1}}(X_{t-1}\in I) \,-\, P_X(I)\big ) \Big |^q \right] \nonumber \\\le & {} C_q\, \big \{ n^{q/2}\, P_X(I)^{q/(2+\epsilon )} \,+\, n\, P_X(I)^{q/(q+\epsilon )} \big \}. \nonumber \\ \end{aligned}$$

(4.7)

Choosing $\epsilon >0$ small enough we have that $n\,P_X(I)^{2-2/(2+\epsilon )}\ge n^{\delta '}$ for some $\delta '>0$. Therefore we obtain from Markov’s inequality that

$$\begin{aligned}{} & {} { P\big ( |N_n(I) \,-\, n\, P_X(I)| \,>\, n\, P_X(I)/2 \big ) } \\{} & {} \quad \le C_q \; \frac{ n^{q/2}\, P_X(I)^{q/(2+\epsilon )} \,+\, n\, P_X(I)^{q/(q+\epsilon )} }{ (n\, \P _X(I)/2)^q } \\{} & {} \quad = O\Big ( \big (n\, P_X(I)^{1-1/(2+\epsilon )}\big )^{-q/2} \,+\, n\; (n \, P_X(I))^{-q} \Big ) =\, O\big ( n^{-\kappa } \big ), \end{aligned}$$

if q is chosen sufficiently large. $\square $

Lemma 4.3

Suppose that $(X_t)_{t\in {\mathbb N}_0}$ is a Markov chain with state space $D\subseteq {\mathbb R}$ and stationary distribution $P_X$ such that (A2) is fulfilled. Then there exist some $C<\infty $ such that, for arbitrary $\underline{x}\le \overline{x}$ with $P_X([\underline{x},\overline{x}])\ge n^{-1/3}$,

$$\begin{aligned} E\left[ \sup _{v:\, v\ge \overline{x}} \frac{ \big | \sum _{t=1}^n \big [ {\mathbb {1}}(X_t\le z) - P(X_t\le z\mid X_{t-1}) \big ] {\mathbb {1}}(X_{t-1}\in [\underline{x},v]) \big | }{ \#\{t\le n:\; X_{t-1}\in [\underline{x},v]\} \vee 1 } \right] \,\le \, C\, n^{-1/3} \nonumber \\ \end{aligned}$$

(4.8a)

and

$$\begin{aligned} E\bigg [ \sup _{u:\, u\le \underline{x}} \frac{ \big | \sum _{t=1}^n \big [ {\mathbb {1}}(X_t\le z) - P(X_t\le z\mid X_{t-1}) \big ] {\mathbb {1}}(X_{t-1}\in [u,\overline{x}]) \big | }{ \#\{t\le n:\; X_{t-1}\in [u,\overline{x}]\} \vee 1 } \bigg ] \,\le \, C\, n^{-1/3}. \nonumber \\ \end{aligned}$$

(4.8b)

Proof

We prove only (4.8a) since the proof of (4.8b) is completely analogous. The proof is carried out in two steps. First we consider the technically simpler case where the distribution function $F_X$ is continuous. This allows us to define a suitable dyadic family of intervals which leads to a readily comprehensible proof. Afterwards we extend the result to the general case.

Step 1 Suppose that $F_X$ is continuous. First we prove that for arbitrary $\delta >0$ and each $v\ge \overline{x}$ there exists some $C<\infty $ such that

$$\begin{aligned}{} & {} E\left[ \sup _{x:\, \underline{x}\le x\le v} \big | \sum _{t=1}^n \big [ {\mathbb {1}}(X_t\le z) - P( X_t\le z\mid X_{t-1} ) \big ] {\mathbb {1}}( X_{t-1} \in [\underline{x},x] ) \big | \right] \nonumber \\{} & {} \quad \,\le \, C\, \sqrt{n\, P_X( [\underline{x},v] )} \,+\, n^\delta . \end{aligned}$$

(4.9)

To deal with the supremum we define a suitable system of dyadic intervals. Let $J_n\in {\mathbb N}$ be such that $n^{\delta -1}/2< 2^{-J_n}P_X([\underline{x},v])\le n^{\delta -1}$. For $j=1,2,\ldots ,J_n$ and $k=1,2,\ldots ,2^j$, we set

$$\begin{aligned} x_{j,k} \,=\, F_X^{-1}\big ( F_X(\underline{x}) + k2^{-j} P_X([\underline{x},v]) \big ) \end{aligned}$$

(4.10a)

and, for $j=1,\ldots ,J_n$,

$$\begin{aligned} B_{j,k} \,=\, \left\{ \begin{array}{ll} [ \underline{x}, x_{j,1} ] &{} \hbox { if }\quad k=1, \\ ( x_{j,k-1}, x_{j,k}] &{} \hbox { if }\quad k=2,\ldots ,2^j. \end{array} \right. \end{aligned}$$

(4.10b)

We have that

$$\begin{aligned} P_X\big ( B_{j,k} \big ) \,=\, 2^{-j} P_X([\underline{x},v]). \end{aligned}$$

(4.11)

We define partial sume as

$$\begin{aligned} S_{j,k} \,=\, \sum _{t=1}^n \big [ {\mathbb {1}}(X_t\le z) - P(X_t\le z\mid X_{t-1})\big ] \, {\mathbb {1}}(X_{t-1}\in B_{j,k}). \end{aligned}$$

Choosing $\gamma $ such that $(1-2\delta )/(1-\delta )\le \gamma <1$ we have that $n(2^{-j}P_X([\underline{x},v]))^\gamma =O\big ( (n2^{-j}P_X([\underline{x},v]))^2 \big )$ for all $j=1,\ldots ,J_n$. Hence, the first term in the bound given in Lemma 4.1 dominates the second and we obtain, for $j=1,\ldots ,J_n,\; k=1,\ldots ,2^j$,

$$\begin{aligned} E\big [ S_{j,k}^4 \big ] \,=\, O\Big ( \big ( n2^{-j} P_X([\underline{x},v]) \big )^2 \Big ), \end{aligned}$$

which implies that

$$\begin{aligned}{} & {} { E\Big [ \big | S_{j,k} \big |\, {\mathbb {1}}\big ( |S_{j,k}|> \sqrt{n P_X([\underline{x},v])}\, 2^{-j/4} \big )\Big ] } \\{} & {} \quad \le \frac{ E\big [ S_{j,k}^4 \big ] }{ \big ( \sqrt{n P_X([\underline{x},v])}\, 2^{-j/4} \big )^3 } \,=\, O\Big ( \sqrt{n P_X([\underline{x},v])}\, 2^{-5j/4} \Big ). \end{aligned}$$

Therefore, we obtain that

$$\begin{aligned}{} & {} { E\Big [ \max _{1\le k\le 2^j} \big | S_{j,k} \big | \Big ] } \nonumber \\{} & {} \quad \le \sqrt{n P_X([\underline{x},v])} \, 2^{-j/4} +\, \sum _{k=1}^{2^j} E\Big [ \big | S_{j,k} \big |\, {\mathbb {1}}\big ( |S_{j,k}|> \sqrt{n P_X([\underline{x},v])}\, 2^{-j/4} \big )\Big ] \nonumber \\{} & {} \quad = O\Big ( \sqrt{n P_X([\underline{x},v])}\, 2^{-j/4} \Big ). \end{aligned}$$

(4.12)

At the finest scale $J_n$, we define for $k=1,\ldots ,J_n$,

$$\begin{aligned} N_{J_n,k} \,=\, \#\big \{ t\le n:\;\; X_{t-1}\in B_{J_n,k} \big \}. \end{aligned}$$

Note that $EN_{J_n,k}=n2^{-J_n}P_X([\underline{x},v])\le n^\delta $. We obtain from (4.7) that

$$\begin{aligned}{} & {} E\big [ N_{J_n,k} \, {\mathbb {1}}( N_{J_n,k}> 2n^\delta ) \big ]\\{} & {} \quad \le E\Big [ \big | N_{J_n,k} - EN_{J_n,k} \big | \, {\mathbb {1}}\big ( |N_{J_n,k} - EN_{J_n,k}|>n^\delta \big ) \Big ] \\{} & {} \quad \le \frac{ C_q\, \big \{ n^{q/2}\, (2^{-J_n}P_X([\underline{x},v]))^{q/(2+\epsilon )} \,+\, n\, (2^{-J_n}P_X([\underline{x},v]))^{q/(q+\epsilon )} \big \} }{ n^{\delta (q-1)} } \\{} & {} \quad = O\big ( n^{-\kappa } \big ) \end{aligned}$$

holds for arbitrary $\kappa <\infty $ if q is chosen large enough. Since $2^{J_n}<n^{1-\delta }/(2 P_X([\underline{x},v]))\le n^{4/3-\delta }/2$ we obtain

$$\begin{aligned} E\left[ \max _{1\le k\le 2^{J_n}} \big \{ N_{J_n,k} \big \} \right] \,=\, 2\, n^\delta \,+\, \sum _{k=1}^{2^{J_n}} E\big [ N_{J_n,k} \, {\mathbb {1}}( N_{J_n,k}> 2n^\delta ) \big ] \,=\, O\big ( n^\delta \big ).\qquad \end{aligned}$$

(4.13)

After these preparatory steps we are in a position to estimate the expected value of the supremum. For arbitrary $x\in [\underline{x},v]$, there exist p and $(j_1,k_1),\ldots ,(j_p,k_p),k$, $1\le j_1<\cdots <j_p\le J_n$, such that $B_{j_1,k_1},\ldots ,B_{j_p,k_p},B_{J_n,k}$ are adjacent intervals and

$$\begin{aligned} B_{j_1,k_1} \cup \cdots \cup B_{j_p,k_p} \subseteq [\underline{x},x] \subseteq B_{j_1,k_1} \cup \cdots \cup B_{j_p,k_p} \cup B_{J_n,k}. \end{aligned}$$

This implies that

$$\begin{aligned} \Big | \sum _{t=1}^n \big [ {\mathbb {1}}(X_t\le z) - P( X_t\le z\mid X_{t-1} ) \big ] {\mathbb {1}}\big ( X_{t-1}\in [\underline{x}, x] \big ) \Big | \,\le \, \sum _{l=1}^p \big | S_{j_l,k_l} \big | \,+\, N_{J_n,k} \end{aligned}$$

and, therefore,

$$\begin{aligned}{} & {} E\left[ \sup _{x:\, \underline{x}\le x\le v} \Big | \sum _{t=1}^n \big [ {\mathbb {1}}(X_t\le z) - P( X_t\le z\mid X_{t-1} ) \big ] {\mathbb {1}}\big ( X_{t-1}\in [\underline{x}, x] \big ) \Big | \right] \\{} & {} \quad = \sum _{j=1}^{J_n} E\left[ \max _{1\le k\le 2^j} \big \{ |S_{j,k}| \big \} \Big ] +\, E\Big [ \max _{1\le k\le 2^{J_n}} \big \{ N_{J_n,k} \big \} \right] . \end{aligned}$$

It follows from (4.12) and (4.13) that (4.9) is fulfilled.

Now we are in a position to prove (4.8a). We define a dyadic sequence of growing intervals, $I_0=[\underline{x},\overline{x}]$ and, for $j\ge 1$

$$\begin{aligned} I_j \,=\, \big [ \underline{x}, F_X^{-1}\big ( F_X(\underline{x}) \,+\, 2^j P_X([\underline{x},\overline{x}]) \big ) \big ]. \end{aligned}$$

(There exists some $K_n\ge 0$ such that $P_X(I_j)=2^j P_X([\underline{x},\overline{x}])$ for $j=0,\ldots ,K_n$ and $P_X(I_{K_n+1})<2^{K_n+1}P_X(\underline{x},\overline{x}])$. Then $I_{K_n+1}=I_{K_n+2}=\ldots $.) Define the event

$$\begin{aligned} A_n \,=\, \Big \{ \big \{\omega :\; \#\{t\le n:\; X_{t-1}(\omega )\in I_j\} \ge n\, P_X(I_j)/2 \;\; \hbox { for all } j=0,\ldots ,K_n \big \} \Big \}. \end{aligned}$$

For $x\in I_{j+1}\setminus I_j$ we use the estimate

$$\begin{aligned}{} & {} { \frac{ \big | \sum _{t=1}^n \big [ {\mathbb {1}}(X_t\le z) - P(X_t\le z\mid X_{t-1}) \big ] {\mathbb {1}}\big ( X_{t-1}\in [\underline{x}, x] \big ) \big | }{ \#\big \{ t\le n:\; X_{t-1}\in [\underline{x},x] \big \} \vee 1 } } \\{} & {} \quad \le \frac{ \sup _{x\in I_{j+1}} \big | \sum _{t=1}^n \big [ {\mathbb {1}}(X_t\le z) - P(X_t\le z\mid X_{t-1}) \big ] {\mathbb {1}}\big ( X_{t-1}\in [\underline{x},x] \big ) \big | }{ \#\big \{ t\le n:\; X_{t-1}\in I_j \big \} \vee 1 }. \end{aligned}$$

It follows from Lemma 4.3 that $P\big ( A_n^c \big )=O\big ( n^{-1/3} \big )$, which implies that

$$\begin{aligned}{} & {} { E\left[ \sup _{x:\, \underline{x}\le x\le v} \frac{ \big | \sum _{t=1}^n \big [ {\mathbb {1}}(X_t\le z) - P(X_t\le z\mid X_{t-1}) \big ] {\mathbb {1}}\big ( X_{t-1}\in [\underline{x}, x] \big ) \big | }{ \#\big \{ t\le n:\; X_{t-1}\in [\underline{x},x] \big \} \vee 1 } \right] } \nonumber \\{} & {} \quad \le E\left[ \sup _{x:\, \underline{x}\le x\le v} \frac{ \big | \sum _{t=1}^n \big [ {\mathbb {1}}(X_t\le z) - P(X_t\le z\mid X_{t-1}) \big ] {\mathbb {1}}\big ( X_{t-1}\in [\underline{x}, x] \big ) \big | }{ \#\big \{ t\le n:\; X_{t-1}\in [\underline{x},x] \big \} } \;\; {\mathbb {1}}_{A_n} \right] \nonumber \\{} & {} \qquad \,+\, P\big ( A_n^c \big ) \nonumber \\{} & {} \quad = \sum _{j=0}^{K_n} \frac{ E\Big [ \big | \sum _{t=1}^n \big [ {\mathbb {1}}(X_t\le z) - P(X_t\le z\mid X_{t-1}) \big ] {\mathbb {1}}\big ( X_{t-1}\in I_{j+1} \big ) \big | \Big ] }{ n\; P_X(I_{j-1})/2 } \,+\, O\big ( n^{-1/3} \big ) \nonumber \\{} & {} \quad = O\left( \sum _{j=0}^{K_n} 2^{-j/2} / \sqrt{ n P_X([\underline{x},\overline{x}]) } \right) \nonumber \\{} & {} \qquad \,+\, O\big ( n^{-1/3} \big ) \,=\, O\big ( n^{-1/3} \big ), \end{aligned}$$

(4.14)

i.e. (4.8a) is fulfilled.

Step 2 In case of a general stationary distribution $P_X$, the definition according to (4.10a) and (4.10b) does no longer guarantee that the convenient property of $P_X( B_{j,k} )=2^{-j}P_X([\underline{x},v])$ holds true. In order to draw on the calculations in Step 1 we act as follows. Let $(V_t)_{t\in {\mathbb N}_0}$ be a sequence of independent random variables following a uniform distribution on [0, 1], which is independent of the process $(X_t)_{t\in {\mathbb N}_0}$. For the latter process we define an accompanying sequence $(U_t)_{t\in {\mathbb N}_0}$ of uniformly distributed random variables, where $U_t$ depends on the pair $(X_t,V_t)$ as follows. If $F_X$ is continuous in the point $X_t$, then we simply set

$$\begin{aligned} U_t:=\, F_X(X_t). \end{aligned}$$

Otherwise, if $F_X$ is discontinuous in $X_t$, then $P_X(\{X_t\})=F_X(X_t)-F_X(X_t-0)>0$ and we set

$$\begin{aligned} U_t:=\, F_X(X_t) \,-\, V_t/P_X(\{X_t\}). \end{aligned}$$

In both cases we have that

$$\begin{aligned} X_t \,=\, F_X^{-1}(U_t), \end{aligned}$$

where $G^{-1}(t)=\inf \{x:G(x)\ge t\}$ denotes the generalized inverse of a generic distribution function G. Since $F_X$ has at most countably many discontinuity points, it follows that the mapping $(X_t,V_t)\mapsto U_t$ is measurable. It also follows that $U_t$ has a uniform distribution on [0, 1]. Furthermore the process $\big ( (X_t,V_t) \big )_{t\in {\mathbb N}_0}$ has the same mixing properties as $\big (X_t\big )_{t\in {\mathbb N}_0}$, i.e.

$$\begin{aligned} \alpha _{(X,V)}(r) \,=\, \alpha _X(r) \qquad \forall r\ge 1; \end{aligned}$$

see e.g. Lemma 8 in Bradley (1981). Now we obtain in complete analogy to the calculations leading to (4.9) in Step 1 that, for arbitrary $0\le \underline{u}<\overline{u}\le 1$,

$$\begin{aligned}{} & {} E\bigg [ \sup _{u:\, \underline{u}\le u\le \overline{u}} \Big | \sum _{t=1}^n \big [ {\mathbb {1}}(X_t\le z) - P(X_t\le z\mid X_{t-1}) \big ] {\mathbb {1}}(U_{t-1}\in [\underline{u},u]) \Big | \bigg ] \nonumber \\{} & {} \quad \,\le \, C \sqrt{n\,(\overline{u}-\underline{u})} \,+\, n^\delta . \end{aligned}$$

(4.15)

It is easy to see that the following inclusions hold true for $\underline{x}\le x$:

$$\begin{aligned} \big \{ F_X(\underline{x}-0) < U_{t-1} \le F_X(x) \big \}\subseteq & {} \big \{ \underline{x} \le X_{t-1} \le x \big \}\\\subseteq & {} \big \{ F_X(\underline{x}-0) \le U_{t-1} \le F_X(x) \big \}. \end{aligned}$$

Indeed, the second inclusion follows immediately from the construction of $U_{t-1}$. Regarding the first one, note that it follows again from the construction of $U_{t-1}$ that $F_X(\underline{x}-0) < U_{t-1}$ implies $\underline{x}\le X_{t-1}$. Furthermore, $U_{t-1}\le F_X(x)$ implies $X_{t-1}=F_X^{-1}(U_{t-1})\le F_X^{-1}(F_X(x))\le x$. Since $P(U_{t-1}=F_X(\underline{x}))=0$ we conclude that

$$\begin{aligned}{} & {} { \sup _{x:\, \underline{x}\le x\le \overline{x}} \Big | \sum _{t=1}^n \big [ {\mathbb {1}}(X_t\le z) - P(X_t\le z\mid X_{t-1}) \big ] {\mathbb {1}}(X_{t-1}\in [\underline{x},x]) \Big | } \\{} & {} \quad \le \sup _{u:\, F_X(\underline{x}-0)\le u\le F_X(\overline{x})} \Big | \sum _{t=1}^n \big [ {\mathbb {1}}(X_t\le z) - P(X_t\le z\mid X_{t-1}) \big ]\\{} & {} \qquad {\mathbb {1}}(U_{t-1}\in [F_X(\underline{x}-0),F_X(x)]) \Big | \end{aligned}$$

holds with probability one. Hence, we obtain from (4.13)

$$\begin{aligned}{} & {} { E\bigg [ \sup _{x:\, \underline{x}\le x\le v} \Big | \sum _{t=1}^n \big [ {\mathbb {1}}(X_t\le z) - P(X_t\le z\mid X_{t-1}) \big ] {\mathbb {1}}(X_{t-1}\in [\underline{x},x]) \Big | \bigg ] } \\{} & {} \quad \le E\bigg [ \sup _{u:\, F_X(\underline{x}-0)\le u\le F_X(v)} \Big | \sum _{t=1}^n \big [ {\mathbb {1}}(X_t\le z) - P(X_t\le z\mid X_{t-1}) \big ]\\{} & {} \qquad {\mathbb {1}}(U_{t-1}\in [F_X(\underline{x}-0),F_X(x)]) \Big | \bigg ] \\{} & {} \quad \le C\, \sqrt{n \, \big (F_X(v)-F_X(\underline{x}-0)\big ) } \,+\, n^\delta \\{} & {} \quad = C\, \sqrt{n \, P_X\big ( [\underline{x},v] \big ) } \,+\, n^\delta , \end{aligned}$$

i.e. (4.9) holds true. $\square $

References

Al-Osh M, Alzaid A (1987) First-order integer-valued autoregressive (INAR(1)) processes. J Time Ser Anal 8(3):261–275
Article MathSciNet Google Scholar
Ayer M, Brunk HD, Ewing GM, Reid WT, Silverman E (1955) An empirical distribution function for sampling with incomplete information. Ann Math Stat 26(4):641–647
Article MathSciNet Google Scholar
Bickel PJ, Freedman DA (1981) Some asymptotic theory for the bootstrap. Ann Stat 9(6):1196–1217
Article MathSciNet Google Scholar
Bradley RC (1981) Central limit theorems under weak dependence. J Multivar Anal 11:1–16
Article MathSciNet Google Scholar
Bradley RC (2007a) Introduction to strong mixing conditions, vol I. Kendrick Press, Heber City
Bradley RC (2007b) Introduction to strong mixing conditions, vol II. Kendrick Press, Heber City
Brunk HD (1955) Maximum likelihood estimates of monotone parameters. Ann Math Stat 26(4):607–616
Article MathSciNet Google Scholar
Canonne CL (2017) A short note on Poisson tail bounds. http://www.cs.columbia.edu/ccanonne/files/misc/2017-poissonconcentration.pdf. Accessed 20 April 2022
Dehling H, Mikosch T (1994) Random quadratic forms and the bootstrap for $U$-statistics. J Multivar Anal 51:392–413
Article MathSciNet Google Scholar
Deng H, Zhang C-H (2020) Isotonic regression in multi-dimensional spaces and graphs. Ann Stat 48(6):3672–3698
Article MathSciNet Google Scholar
Doukhan P (1994) Mixing: properties and examples. Lecture notes in statistics, vol 85. Springer, Berlin
Google Scholar
Durot C (2002) Sharp asymptotics for isotonic regression. Probab Theory Relat Fields 122:222–240
Article MathSciNet Google Scholar
Freedman DA (1981) Bootstrapping regression models. Ann Stat 9(6):1218–1228
Article MathSciNet Google Scholar
Leucht A, Neumann MH (2009) Consistency of general bootstrap methods for degenerate U-and V-type statistics. J Multivar Anal 100:1622–1633
Article MathSciNet Google Scholar
Leucht A, Neumann MH (2013) Dependent wild bootstrap for degenerate $U$- and $V$-statistics. J Multivar Anal 117:257–280
Article MathSciNet Google Scholar
Leucht A, Neumann MH, Kreiss J-P (2015) A model specification test for GARCH(1,1) processes. Scand J Stat 42:1167–1193
Article MathSciNet Google Scholar
Lindvall T (1992) Lectures on the coupling method. Wiley, New York
Google Scholar
McKenzie E (1985) Some simple models for discrete variate time series. Water Resour Bull 21(4):645–650
Article Google Scholar
Mösching A, Dümbgen L (2020) Monotone least squares and isotonic quantiles. Electron J Stat 14:24–49
Article MathSciNet Google Scholar
Neumann MH (2021) Bootstrap for integer-valued GARCH($p$,$q$) processes. Stat Neerl 75(3):343–363
Article MathSciNet Google Scholar
Pakes AG (1971) Branching processes with immigration. J Appl Probab 8(1):32–42
Article MathSciNet Google Scholar
Paparoditis E, Politis DN (2002) The local bootstrap for Markov processes. J Stat Plan Inference 108:301–328
Article MathSciNet Google Scholar
Rajarshi MB (1990) Bootstrap in Markov-sequences based on estimates of transition density. Ann Inst Stat Math 42:253–268
Article MathSciNet Google Scholar
Robertson T, Wright FT, Dykstra RL (1988) Order restricted statistical inference. Wiley, New York
Google Scholar
van der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge
Book Google Scholar
Zhang C-H (2002) Risk bounds in isotonic regression. Ann Stat 30(2):528–555
Article MathSciNet Google Scholar

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Institut für Mathematik, Friedrich-Schiller-Universität Jena, Ernst-Abbe-Platz 2, 07743, Jena, Germany
Michael H. Neumann

Authors

Michael H. Neumann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael H. Neumann.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Neumann, M.H. Estimation and bootstrap for stochastically monotone Markov processes. Metrika 87, 31–59 (2024). https://doi.org/10.1007/s00184-023-00903-7

Download citation

Received: 09 May 2022
Accepted: 09 February 2023
Published: 28 February 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s00184-023-00903-7

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Estimation and bootstrap for stochastically monotone Markov processes

Abstract

Similar content being viewed by others

Martingale Estimating Functions for Stochastic Processes: A Review Toward a Unifying Tool

Bootstrap for Maximum Likelihood Estimates of PARMA Coefficients

Approximating Markov Chains for Bootstrapping and Simulation

1 Introduction

2 An estimator of a monotone family of distribution functions

Theorem 2.1

3 A new bootstrap method for Markov chains

Theorem 3.1

Remark 1

4 Proofs

4.1 Proofs of the main results

Proof of Theorem 2.1

Proof of Theorem 3.1

4.2 Some auxiliary lemmas

Lemma 4.1

Proof

Lemma 4.2

Proof

Lemma 4.3

Proof

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Estimation and bootstrap for stochastically monotone Markov processes

Abstract

Similar content being viewed by others

Martingale Estimating Functions for Stochastic Processes: A Review Toward a Unifying Tool

Bootstrap for Maximum Likelihood Estimates of PARMA Coefficients

Approximating Markov Chains for Bootstrapping and Simulation

1 Introduction

2 An estimator of a monotone family of distribution functions

Theorem 2.1

3 A new bootstrap method for Markov chains

Theorem 3.1

Remark 1

4 Proofs

4.1 Proofs of the main results

Proof of Theorem 2.1

Proof of Theorem 3.1

4.2 Some auxiliary lemmas

Lemma 4.1

Proof

Lemma 4.2

Proof

Lemma 4.3

Proof

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation