Some asymptotic properties of kernel regression estimators of the mode for stationary and ergodic continuous time processes

Bouzebda, Salim; Didi, Sultana

doi:10.1007/s13163-020-00368-6

Some asymptotic properties of kernel regression estimators of the mode for stationary and ergodic continuous time processes

Published: 17 August 2020

Volume 34, pages 811–852, (2021)
Cite this article

Download PDF

Revista Matemática Complutense Aims and scope Submit manuscript

Some asymptotic properties of kernel regression estimators of the mode for stationary and ergodic continuous time processes

Download PDF

2143 Accesses
11 Citations
Explore all metrics

Abstract

In the present paper, we consider the nonparametric regression model with random design based on $(\mathbf{X}_\mathrm{t},\mathbf{Y}_\mathrm{t})_{\mathrm{t}\ge 0}$ a $\mathbb {R}^{d}\times \mathbb {R}^{q}$-valued strictly stationary and ergodic continuous time process, where the regression function is given by $m(\mathbf{x},\psi ) = \mathbb {E}(\psi (\mathbf{Y}) \mid \mathbf{X} = \mathbf{x}))$, for a measurable function $\psi : \mathbb {R}^{q} \rightarrow \mathbb {R}$. We focus on the estimation of the location ${\varvec{\Theta }}$ (mode) of a unique maximum of $m(\cdot , \psi )$ by the location $ \widehat{{\varvec{\Theta }}}_\mathrm{T}$ of a maximum of the Nadaraya–Watson kernel estimator $\widehat{m}_\mathrm{T}(\cdot , \psi )$ for the curve $m(\cdot , \psi )$. Within this context, we obtain the consistency with rate and the asymptotic normality results for $ \widehat{{\varvec{\Theta }}}_\mathrm{T}$ under mild local smoothness assumptions on $m(\cdot , \psi )$ and the design density $f(\cdot )$ of $\mathbf{X}$. Beyond ergodicity, any other assumption is imposed on the data. This paper extends the scope of some previous results established under the mixing condition. The usefulness of our results will be illustrated in the construction of confidence regions.

Multivariate Gaussian processes: definitions, examples and applications

Article Open access 27 January 2023

On some stable linear functional regression estimators based on random projections

Article 17 April 2024

Reversed particle filtering for hidden markov models

Article 08 April 2024

1 Introduction

Nonparametric estimation has been the subject of intense investigation for many years and this has led to the development of a large variety of methods. Because of numerous applications and their important role in mathematical statistics, the problem of estimating the density and regression function has been the subject of considerable interest during the last decades. One of the most commonly used classes of estimators is that formed by the so-called kernel-type estimators. For more theoretical aspects along with statistical applications the interested reader is referred to Tapia and Thompson [71], Wertz [74], Devroye and Györfi [23], Devroye [22], Nadaraya [57], Härdle [38], Wand and Jones [73], Eggermont and LaRiccia [30], Devroye and Lugosi [24] and the references therein. Recently, a number of statistical problems has found an unexpected solution being investigated by a “modal point of view”. This investigation includes classical processes such as clustering. This has led to a renewed interest in the estimation and the inference for the mode. The estimation of the conditional mode of an outcome variable given the regressors, is called modal regression. Modal regression is an alternative approach to the usual regression methods for exploring the relationship between a response variable $\mathbf{Y}$ and a predictor variable $\mathbf{X}$. Unlike conventional regression, which is based on the conditional mean of $\mathbf{Y}$ given $\mathbf{X} = \mathbf{x}$, modal regression estimates conditional modes of $\mathbf{Y}$ given $\mathbf{X} = \mathbf{x}$. Modal regression is a more reasonable modelling approach than the usual regression at least in two scenarios. Firstly when the conditional density function is skewed or has a heavy tail. When the conditional density function has skewness, the conditional mean may not provide a good representation for summarising the relations between the response and the covariate. The other scenario is when the conditional density function has multiple local modes. This occurs when the relation of $\mathbf{X}$ and $\mathbf{Y}$ contains multiple patterns. The conditional mean may not capture any of these patterns, so it can be a very bad summary; see Chen et al. [17] for an example. This situation has already been pointed out in Tarter and Lock [72]. Modal regression has a wide variety of applications including the analysis of traffic and forest fire data [31, 75], econometrics [45, 50, 51], and machine learning [33, 68]. For example, Kemp and Santos Silva [45] argue that the mode is the most intuitive measure of central tendency for positively skewed data found in many econometric applications such as wages, prices, and expenditures [45, p. 93]. For more recent reviews and further details on the subject the reader is referred to Chen [16] and Chacón [14].

We will start by providing some notation and definitions that are needed for the forthcoming sections. Let $(\mathbf{X}_\mathrm{t},\mathbf{Y}_\mathrm{t})_{\mathrm{t}\ge 0}$ be a $\mathbb {R}^{d}\times \mathbb {R}^{q}$-valued strictly stationary and ergodic^{Footnote 1} continuous time process defined on a probability space $(\Omega , \mathcal {F},\mathbb {P)}$. Let $g(\cdot ,\cdot )$ be the density function of the random vector $(\mathbf{X}_\mathrm{t},\mathbf{Y}_\mathrm{t})$, $f(\cdot )$ be the density of $\mathbf{X}_\mathrm{t}$ and $\rho (\cdot )$ the density of $\mathbf{Y}$. For a given measurable function $\psi (\cdot )$ and $\mathbf{x}\in \mathbb {R}^{d}$ the regression function, whenever it exists, is defined to be

$$\begin{aligned} m(\mathbf{x},\psi )=\mathbb {E}(\psi (\mathbf{Y})\mid \mathbf{X}=\mathbf{x}). \end{aligned}$$

In this situation, we have the random design regression model and $\mathbf{X}$ is called the design variable and $\mathbf{Y}$ the response variable. The random design model is very important in clinical studies, where the design variable usually represents the age of a particular individual receiving treatment, and $\mathbf{Y}$ is the quantity whose dependence on the age of the patient is investigated. A typical example (from forensic medicine) is given by Härdle and Marron [39], where $\mathbf{Y}$ stands for the liver weight of female persons (depending on their age). Inequalities $\mathbf{x} \le \mathbf{y}$ holds for all the components, i.e., $x_{j}\le y_{j}$ for all $j = 1,\ldots ,d$. The introduction of the function $\psi (\cdot )$ allows us to include some important special cases:

$\psi (\mathbf{Y}) = \mathbb {1}\{\mathbf{Y}\le \mathbf{y}\}$ gives the conditional distribution of $\mathbf{Y}$ given $\mathbf{X}=\mathbf{x}$.
$\psi (\mathbf{Y}) =\mathbf{Y}^{k}$ gives the conditional moments of $\mathbf{Y}$ given $\mathbf{X}=\mathbf{x}$.

In the present paper, we focus on estimating the location ${\varvec{\Theta }}$ and the size $m({\varvec{\Theta }},\psi )$ of a unique maximum (mode, peak) of the (unknown) function $m(\cdot ,\psi )$. Our method is indirect in the sense that the estimators of ${\varvec{\Theta }}$ and $m({\varvec{\Theta }},\psi )$ are based on a kernel estimator $\widehat{m}_\mathrm{T}(\mathbf{x},\psi )$ of the regression curve $m(\mathbf{x},\psi )$. We will use the Nadaraya–Watson estimator which is defined by

$$\begin{aligned} \widehat{m}_\mathrm{T}(\mathbf{x},\psi ):=\left\{ \begin{array}{lcr} \frac{\displaystyle \frac{ 1}{Th_\mathrm{T}^d}\int _0^T\psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt}{\displaystyle \frac{ 1}{Th_\mathrm{T}^d}\int _0^T K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt},&{}\text{ if }&{}\displaystyle \frac{ 1}{Th_\mathrm{T}^d}\int _0^T K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt\ne 0,\\ \displaystyle \frac{1}{T}\int _0^T\psi (\mathbf{Y}_\mathrm{t})dt, &{}\text{ if }&{}\displaystyle \frac{ 1}{Th_\mathrm{T}^d}\int _0^T K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt= 0, \end{array}\right. \end{aligned}$$

where $K(\cdot )$ is a kernel, $h_\mathrm{T}$ is a positive sequence of real numbers such that

$$\begin{aligned} (i)\ \underset{T \rightarrow \infty }{\lim } h_\mathrm{T} = 0, \quad (ii)\ \underset{T \rightarrow \infty }{\lim } Th_\mathrm{T}^{d}= +\infty , \quad \text{ or }\quad (iii)\ \underset{T \rightarrow \infty }{\lim } \frac{\displaystyle Th_\mathrm{T}^{d}}{\displaystyle \log T}= + \infty . \end{aligned}$$

(1.1)

The condition (i) is used to obtain the asymptotic unbiasedness of the kernel (density or regression) type estimators. We need more restrictive assumption on $h_{T}$ for the consistency, this is given by the condition (ii), one can refer to Parzen [61]. In general, the strong consistency fails to hold when either (i) or (iii) is not satisfied. Now the location ${\varvec{\Theta }}$ (mode) and the size $m({\varvec{\Theta }},\psi )$ are estimated by the respective functionals $\widehat{{{\varvec{\Theta }} }}_\mathrm{T}$ and $\widehat{m}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )$ pertaining to $\widehat{m}_\mathrm{T}(\cdot ,\psi )$, i.e., $\widehat{{{\varvec{\Theta }} }}_\mathrm{T}$ is chosen through the equation

$$\begin{aligned} \widehat{m}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}, \psi ) = \sup _{\mathbf{x}\in \mathfrak {C}} \widehat{m}_\mathrm{T}(\mathbf{x},\psi ), \end{aligned}$$

(1.2)

where the maximum is running over some compact set $\mathfrak {C}\subset \mathbb {R}^{d}$. Note that $\widehat{{{\varvec{\Theta }} }}_\mathrm{T}$ exists if $K(\cdot )$ is continuous; however, it may not be unique. In fact, it is known that kernel estimators tend to produce some additional and superfluous modality. In this context, one can consider

$$\begin{aligned} \widehat{{{\varvec{\Theta }} }}_\mathrm{T}=\inf \left\{ \mathbf{t}\in \mathfrak {C} ~~\text{ such } \text{ that }~~ \widehat{m}_\mathrm{T}(\mathbf{t},\psi )=\sup _{\mathbf{x}\in \mathfrak {C}}\widehat{m}_\mathrm{T}(\mathbf{x},\psi )\right\} , \end{aligned}$$

where the infimum is taken with respect to the lexicographic order on $\mathbb {R}^{d}$. However, this has no bearing on asymptotic theory; our results are valid for any choice of $\widehat{{{\varvec{\Theta }} }}_\mathrm{T}$ satisfying (1.2). To ensure both uniqueness and measurability of $\widehat{{{\varvec{\Theta }} }}_\mathrm{T}$, one could use the so-called mode functional on $C(\mathfrak {C})$ apparently introduced by Eddy [28] which considers the infimum of the maximised locations and whose measurability is also proved in a paper of Eddy [28]. Alternatively, Grund and Hall [35] have suggested to break ties at random if necessary. Anyway, the validity of our proofs will not be affected by potential non-measurability of $\widehat{{{\varvec{\Theta }} }}_\mathrm{T}$, since we can always replace probabilities by outer probabilities when necessary with no further changes in the proofs, this issue is discussed in Ziegler [79], and also Ziegler [77, 78], Herrmann and Ziegler [41]. As it is mentioned in Ziegler [79], estimating the mode and size of a maximum of a nonparametric curve by the corresponding functionals of a kernel estimator of the curve is not new; it stems from the closely related problem of estimating the mode of a density. In continuation of Parzen [61] pioneering work on density estimation and estimation of the mode, Eddy [28, 29] and Romano [66] tackled optimality questions of the kernel density estimators of the mode. Romano [66] seems also to be the first to consider data-dependent bandwidths in this framework. In another paper, Romano [65] examined the limiting behaviour of bootstrap estimators of the location of the mode, an idea used later by Grund and Hall [35] in the context of bandwidth selection by minimising the bootstrapped $L_{p}$-error for the mode estimator. It is worth noticing that the conditional mode function estimate of the predictor is used for the first time by Collomb et al. [18]. The kernel type estimators were studied extensively in different setting of dependencies, we cite among many others Samanta and Thavaneswaran [67], Ould-Saïd [59], Quintela-Del-Río and Vieu [64], Berlinet et al. [5], Ferraty et al. [34], Ezzahrioui and Ould-Saïd [32], Benrabah et al. [3] and the references therein. Quintela-Del-Río and Vieu [64] motivated the use of the conditional mode by pointing out that the prediction of $\mathbf{Y}$-values given the $\mathbf{X}$-values is achieved through the regression function estimation. Finally, when the process is considered to be i.i.d., the almost sure convergence along with the mean convergence of the conditional density were obtained by Youndjé [76]. Ota et al. [60] proposed a new estimator of the conditional mode that is able to avoid the curse of dimensionality and at the same time is computationally scalable, thereby complementing the above existing methods.

Within the framework described above, our aim is to establish consistency and asymptotic normality results (which in turn can be exploited for the construction of confidence intervals) for the estimators $\widehat{{{\varvec{\Theta }} }}_\mathrm{T}$ and $\widehat{m}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )$ for the location and the size of the peak under some mild local smoothness conditions on the regression function $m(\cdot ,\psi )$ and the design density $f(\cdot )$ (mostly imposed locally in a neighbourhood of ${{\varvec{\Theta }} }$). Those results will be valid for a wide class of kernels not necessarily having compact support. This includes, in particular, the Gaussian kernel which is widely used in practice. Mixing is some kind of asymptotic independence assumption which is commonly used for seek of simplicity but which can be unrealistic in situations where there is strong dependence between the data. Extending non-parametric functional ideas to general dependence structure is a rather underdeveloped field. Note that the ergodic framework avoids the widely used strong mixing condition and its variants to measure the dependency and the involved probabilistic calculations that it implies (see, for instance, Masry [55]). It is worth noticing that the ergodicity is implied by all mixing conditions, being weaker than all of them. Further motivations to consider ergodic data are discussed in Laib and Louani [48, 49], Didi and Louani [27], Bouzebda et al. [12], Bouzebda et al. [8], Bouzebda and Didi [9,10,11] and Krebs [46], in some of these references the definitions of the ergodic property of continuous time processes are given. In the present work, we do not assume anything beyond ergodicity of the underlying process. It is worth noticing that strong mixing implies ergodicity; see e.g., Remark 2.6 on page 50 in combination with Proposition 2.8 on page 51 in Bradley [13]. Hence the present work extends the scope of applications compared to the existing works. On the other hand, we mention that there exist interesting processes which are ergodic but not mixing according to Andrews [1] and Bradley [13]. An example of an ergodic and non-mixing process was considered in Sect. 5.3 of Leucht and Neumann [52]. Indeed, assume that the process $\{(T_{i},\lambda _{i}):i\in \mathbb {Z}\}$ is strictly stationary with $T_{i}\mid \mathcal {T}_{i-1}\sim \text{ Poisson }(\lambda _{i})$, let $\mathcal {T}_i$ be the $\sigma $-field generated by $(T_{i}, \lambda _{i},T_{i-1},\lambda _{i-1},\ldots )$. We assume that $\lambda _{i}=\kappa (\lambda _{i-1}, T_{i-1})$, where $\kappa :[0,\infty )\times \mathbb {N}\rightarrow (0,\infty )$. However, this process is not mixing in general; see Remark 3 of Neumann [58] for a counterexample. We refer to Leucht and Neumann [52] for further details and motivations for the use of the ergodicity assumption. One of their arguments, is that for certain classes of processes, it can be much easier to prove ergodicity rather than mixing assumption. It is known that any sequence $\{\varepsilon _\mathrm{t}:t\in \mathbb {Z}\}$ of i.i.d. random variables is ergodic. Hence, it is immediately clear that $\{\mathbf{Y}_\mathrm{t} :t \in \mathbb {Z}\}$ with

$$\begin{aligned} \mathbf{Y}_\mathrm{t} = \vartheta ((\ldots , \varepsilon _{\mathrm{t}-1}, \varepsilon _\mathrm{t} ), (\varepsilon _{\mathrm{t}+1},\varepsilon _{\mathrm{t}+2},\ldots )) \end{aligned}$$

is also ergodic. Didi [25] has constructed an example of a non-mixing ergodic continuous time process. It is well known that the fractional Brownian motion $\{W_\mathrm{t}^H:t\ge 0\}$ with parameter $H\in (0,1)$ has strictly stationary increments. Otherwise, the fractional Gaussian noise, defined for every $s>0$ by

$$\begin{aligned} \{G_\mathrm{t}^H:t\ge 0\}:=\{W_{\mathrm{t}+s}^H-W_\mathrm{t}^H :t\ge 0\}, \end{aligned}$$

is a strictly stationary centered long memory process when $H\in (\frac{1}{2},1)$ (see for instance, Beran [4, p.55] and Lu [53, p.17]), hence the condition of strong mixing is not satisfied. Let $\{G_\mathrm{t}:t\ge 0\}$ be a strictly stationary centered Gaussian process with correlation function

$$\begin{aligned} R(t)=\mathbb {E}[G_0G_\mathrm{t}]. \end{aligned}$$

Relaying on the work of Maslowski and Pospíšil [54], Lemma 4.2, it follows that the process $\{G_\mathrm{t}:t\ge 0\}$ is ergodic whenever

$$\begin{aligned} \lim _{\mathrm{t}\rightarrow \infty } R(t)=0, \end{aligned}$$

which is the case for the process $\{G_\mathrm{t}^H: t\ge 0\}$. The ergodicity hypothesis seems to be the most naturally adapted and provides a better framework to study data series, for example, generated by noisy chaos.

To the best of our knowledge, the results presented here, respond to a problem that has not been studied systematically until recently, and it gives the main motivation to this paper. Indeed, we establish the exact rate of strong uniform consistency of the estimators $\widehat{{{\varvec{\Theta }} }}_\mathrm{T}$ and we characterise the limiting law. To prove our results, we base our methodology upon the martingale approximation which allows to provide a unified nonparametric time series analysis framework enabling one to study systematically dependent data. This methodology is a quite different approach, in the i.i.d. context, compared to the existing ones.

The layout of the article is as follows. The assumptions and asymptotic properties of the estimators are given in Sect. 2, which includes the optimal convergence rates and the asymptotic normality of the estimators $\widehat{{{\varvec{\Theta }} }}_\mathrm{T}$. Some concluding remarks and possible future developments are mentioned in Sect. 3. To avoid interrupting the flow of the presentation, all mathematical proofs are presented in Sect. 4.

2 Main results

Let us introduce some notation and definitions. Let $\alpha = (\alpha _{1},\ldots ,\alpha _{d})$ be a multi-index of the nonnegative integers $\alpha _{ i}$, set $|\alpha |=\sum _{ i=1}^{d}\alpha _{ i}$, and let

$$\begin{aligned} D^{\alpha }=\frac{\displaystyle \partial |\alpha |}{\displaystyle (\partial x_{1})^{\alpha _{1}}\cdots (\partial x_{d})^{\alpha _{d}}} \end{aligned}$$

denote the partial differential operator of order $\alpha $. For $\alpha =0$ set $D^{\alpha }=id$, for identity. For continuous real-valued functions $\zeta _{1}(\cdot )$ and $\zeta _{2}(\cdot )$ that are s-times continuously differentiable on $\mathbb {R}^{d}$,

$$\begin{aligned} D^{\alpha }(\zeta _{1}\zeta _{2})=\sum _{\{\beta \,:\,\beta \le \alpha \}}\frac{\alpha !}{(\alpha -\beta )! \beta !}\left( {\begin{array}{c}\alpha \\ \beta \end{array}}\right) (D^{\beta }\zeta _{2})(D^{\alpha -\beta }\zeta _{2}). \end{aligned}$$

We will use the notation

$$\begin{aligned} D^{i}\zeta _{1}=\zeta _{1}^{(i)}~ \text{ for } ~ i=1,\ldots ,s. \end{aligned}$$

Let us define the partial derivatives of order one of the regression estimator by

$$\begin{aligned} \widehat{m}_\mathrm{T}^{(1)}(\mathbf{x},\psi )= & {} \left( \frac{\displaystyle M_\mathrm{T}(\mathbf{x},\psi )}{\displaystyle f_\mathrm{T}(\mathbf{x})}\right) ^{(1)}\\= & {} \frac{\displaystyle M_\mathrm{T}^{(1)}(\mathbf{x},\psi )f_\mathrm{T}(\mathbf{x})-f_\mathrm{T}^{(1)}(\mathbf{x})M_\mathrm{T}(\mathbf{x},\psi )}{\displaystyle f_\mathrm{T}^2(\mathbf{x})}. \end{aligned}$$

The derivatives of order $\alpha =1,2$ of the estimators $f_\mathrm{T}(\mathbf{x})$ and $M_\mathrm{T}(\mathbf{x},\psi )$ are defined as follows

$$\begin{aligned} f_\mathrm{T}^{(\alpha )}(\mathbf{x})= \frac{1}{Th_\mathrm{T}^{d+\alpha }}\int _0^T K^{(\alpha )}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt, \end{aligned}$$

and

$$\begin{aligned} M_\mathrm{T}^{(\alpha )}(\mathbf{x},\psi )= \frac{1}{Th_\mathrm{T}^{d+\alpha }}\int _0^T \psi (\mathbf{Y}_\mathrm{t}) K^{(\alpha )}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt. \end{aligned}$$

We denote by $m^{(1)}(\cdot ,\psi )$ the gradient of the function $m(\cdot ,\psi )$ : $\mathbb {R}^{d} \rightarrow \mathbb { R}$, that is, $m^{(1)}(\cdot ,\psi )$ is the $d \times 1$-vector of the partial derivatives of $m(\cdot ,\psi )$

$$\begin{aligned} m^{(1)}(\cdot ,\psi )=\left( \frac{\partial }{\partial x_{1}}m(\cdot ,\psi ),\ldots ,\frac{\partial }{\partial x_{d}}m(\cdot ,\psi )\right) ^{\top }. \end{aligned}$$

Using the definition of the conditional mode function, i.e. the mode of $m(\cdot ,\psi )$, we have

$$\begin{aligned} m^{(1)}({\varvec{\Theta }},\psi ) =\left( \frac{\partial }{\partial x_{1}}m({\varvec{\Theta }},\psi ),\ldots ,\frac{\partial }{\partial x_{d}}m({\varvec{\Theta }},\psi )\right) ^{\top }= 0. \end{aligned}$$

(2.1)

Similarly, it follows from the statement (2.1) that

$$\begin{aligned} \widehat{m}^{(1)}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi ) =\left( \frac{\partial }{\partial x_{1}}\widehat{m}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi ),\ldots ,\frac{\partial }{\partial x_{d}}\widehat{m}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )\right) ^{\top }= 0. \end{aligned}$$

We denote by $m^{(2)}(\cdot ,\psi )$ the Hessian of the function $m(\cdot ,\psi )$, that is, $m^{(2)}(\cdot ,\psi )$ is the $d \times d$-matrix of the second partial derivatives of $m(\cdot ,\psi )$ . Furthermore, assumption (A.7) implies that

$$\begin{aligned} m^{(2)}({\varvec{\Theta }},\psi )<0, \quad \text{ and } \quad \widehat{m}_\mathrm{T}^{(2)}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi ) <0. \end{aligned}$$

By the definition of $\widehat{{{\varvec{\Theta }} }}_\mathrm{T}$, we have $\widehat{m}^{(1)}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi ) = 0$ so that

$$\begin{aligned} \widehat{m}^{(1)}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )-\widehat{m}^{(1)}_\mathrm{T}({{\varvec{\Theta }}},\psi )=-\widehat{m}^{(1)}_\mathrm{T}({{\varvec{\Theta }}},\psi ). \end{aligned}$$

(2.2)

For each $i \in \{1,\ldots , d\}$, Taylor’s expansion applied to the real-valued application $\frac{\partial }{\partial _{x_{i}}} \widehat{m}^{(1)}_\mathrm{T}(\cdot ,\psi )$ implies the existence of ${\varvec{\Theta }}_\mathrm{T}^\star (i)=(\Theta _{\mathrm{T},1}^\star (i),\ldots ,\Theta _{\mathrm{T},d}^\star (i))^{\top }$

$$\begin{aligned} \left\{ \begin{array}{l} \displaystyle \frac{\partial }{\partial x_{i}}\widehat{m}_\mathrm{T}(\widehat{{\varvec{\Theta }}}_\mathrm{T},\psi ) - \displaystyle \frac{\partial }{\partial x_{i}}\widehat{m}_\mathrm{T}({{\varvec{\Theta }}},\psi ) =\sum _{j=1}^{d}\displaystyle \frac{\partial ^2}{\partial x_{i}\partial x_{j}}\widehat{m}_\mathrm{T}({\varvec{\Theta }}_\mathrm{T}^\star (i),\psi )(\widehat{\Theta }_{\mathrm{T},j}-\Theta _{j}), \\ \left| \Theta _{\mathrm{T},1}^\star (i)-\Theta _{j}\right| \le |\widehat{\Theta }_{\mathrm{T},j}-\Theta _{j}|, ~~j\in \{1,\ldots ,d\}. \end{array}\right. \end{aligned}$$

(2.3)

Define the $d \times d$ matrix $H_\mathrm{T} = (H_{\mathrm{T},i,j} )1\le i,j\le d$ by setting

$$\begin{aligned} H_{\mathrm{T},i,j}= \frac{\partial ^2}{\partial x_{i}\partial x_{i}}\widehat{m}_\mathrm{T}({\varvec{\Theta }}_\mathrm{T}^\star (i),\psi ). \end{aligned}$$

Equation (2.2) can then be rewritten as

$$\begin{aligned} H_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}-{{\varvec{\Theta }}})=-\widehat{m}^{(1)}_\mathrm{T}({{\varvec{\Theta }}},\psi ). \end{aligned}$$

(2.4)

The last relation will play an important role in our proofs, in particular for the asymptotic normality. To formulate our assumptions, some additional notation is required, for some constant $\delta >0$ small enough, let $n\in \mathbb {N}$ be such that $T=\delta n$, and $T_j=j\delta $, for $j=1,\ldots ,n$. Let $\mathcal {F}_\mathrm{t}$ be the $\sigma -$field defined by

$$\begin{aligned} \mathcal {F}_\mathrm{t}:=\sigma \{(\mathbf{X}_s,\mathbf{Y}_s): 0\le s< t \}. \end{aligned}$$

Set $\mathcal {F}_{j}$ to be the $\sigma $-field defined by

$$\begin{aligned} \mathcal {F}_{j}:=\sigma \{(\mathbf{X}_s,\mathbf{Y}_s): 0\le s\le T_j\}. \end{aligned}$$

Let $ \mathcal {S}_{\mathrm{t},\delta }$ be the $\sigma $-field defined by

$$\begin{aligned} \mathcal {S}_{\mathrm{t},\delta }:= \sigma \{(\mathbf{X}_s,\mathbf{Y}_s),(\mathbf{X}_r): 0\le s< t; t\le r \le t+\delta \}. \end{aligned}$$

Let $\mathcal {G}_t:=\sigma \{(\mathbf{X}_s,\mathbf{Y}_s): 0\le s\le t\}$, and for $\delta >0$ small enough, let $g^{\mathcal {G}_{t-\delta }}(\cdot )$ and $\rho ^{\mathcal {G}_{t-\delta }}(\cdot )$ be the conditional densities of $(\mathbf{X},\mathbf{Y})$ and $\mathbf{Y}$ respectively, given the $\sigma -$field $\mathcal {G}_{t-\delta }$. Finally, if $\zeta (\cdot )$ is a real-valued random function which satisfies $\zeta (u) / u \rightarrow 0$ a.s. as $u \rightarrow 0,$ we write $\zeta (u)=o_{\text{ a.s. } }(u)$. In the same way, we say that $\zeta (u)$ is $O_{\text{ a.s. } }(u)$ if $\zeta (u) / u$ is a.s. bounded as $u \rightarrow 0 .$

2.1 Assumptions

In our analysis, the following assumptions are needed.

(A.1)
The kernel $K(\cdot )$, is a probability density function compactly supported,
1. (i)
  Kernel K is assumed to be Lipschitz with ratio $C_K<\infty $ and order $\gamma $, i.e.,
  $$\begin{aligned} |K(\mathbf{x})-K(\mathbf{x}^{'}) | \le C_K\Vert \mathbf{x}-\mathbf{x}^{'}\Vert ^{\gamma }, \quad (\mathbf{x},\mathbf{x}^{'})\in \mathbb {R}^{2d}; \end{aligned}$$
2. (ii)
  $\int _{\mathbb {R}^d} \Vert \mathbf{x}\Vert K(\mathbf{x}) d\mathbf{x} <\infty ;$
(A.2)
There exists $\Gamma < \infty $, such that, for all $\mathbf{x} \in \mathfrak {C}$,
$$\begin{aligned} \underset{\mathbf{x} \in \mathfrak {C}}{\sup } |m(\mathbf{x},\psi )|< \Gamma ; \end{aligned}$$
(A.3)
1. (i)
  Recall that $\mathfrak {C}$ is a compact set of $\mathbb {R}^d$. Assume, for all $\mathbf{x} \in \mathfrak {C}$, that there exists $\lambda >0$ and finite constant $0<\eta $ such that
  $$\begin{aligned} \lambda \le f(\mathbf{x}) \le \eta ; \end{aligned}$$
2. (ii)
  the density $f(\cdot )$ is an element of $\mathcal {C}^2(\mathbb {R}^{d})$;
(A.4)
For every $t\in \mathbb {R}_+$, for every $\mathbf{x}\in \mathbb {R}^{d},$
1. (i)
  The conditional density $f_{\mathbf{X}_\mathrm{t}}^{\mathcal {F}_{\mathrm{t}- \delta }}(\cdot )$ of $\mathbf{X}_\mathrm{t}$ given the $\sigma $-field $\mathcal {F}_{\mathrm{t}- \delta }$ exists a.s. and is an element of $\mathcal {C}^2(\mathbb {R}^{d})$;
2. (ii)
  For any $\delta >0$ small enough
  $$\begin{aligned} \underset{T\rightarrow \infty }{\lim } \frac{1}{T} \int _0^T f_{\mathbf{X}_\mathrm{t}}^{\mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x}) dt = f(\mathbf{x}) , \quad \text{ a.s. }; \end{aligned}$$
(A.5)
For every $t\in \mathbb {R}_+$, for every $\mathbf{x}\in \mathbb {R}^{d},$
$$\begin{aligned} \underset{T\rightarrow \infty }{\lim } \underset{\mathbf{x} \in \mathbb {R}^d}{\sup } \left| \frac{1}{T} \int _0^T f_{\mathbf{X}_\mathrm{t}}^{\mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x}) dt - f(\mathbf{x}) \right| = 0, \quad \text{ a.s. }, \end{aligned}$$
for any $\delta >0$ small enough;
(A.6)
For any t and r such that $t\in [0,T]$ and $t\le r\le t+\delta $ we have
1. (i)
  $$\begin{aligned} \mathbb {E}(|\psi (\mathbf{Y}_r)|\vert \mathcal {S}_{\mathrm{t},\delta })=\mathbb {E}(|\psi (\mathbf{Y}_r)|\vert \mathbf{X}_r)=m(\mathbf{X}_r); \end{aligned}$$
2. (ii)
  there exist constants $C_\psi >0$ and $\beta >0$ such that, for any couple $(\mathbf{x},\mathbf{x}^\prime )\in \mathbb {R}^{2d}$,
  $$\begin{aligned} \left| m(\mathbf{x},\psi )-m(\mathbf{x}^\prime ,\psi )\right| \le C_\psi \left\| \mathbf{x}-\mathbf{x}^\prime \right\| ^\beta ; \end{aligned}$$
3. (iii)
  For any $k\ge 2$ and any $\delta >0$,
  $$\begin{aligned} \mathbb {E}\left( \left| \psi ^k(\mathbf{Y}_r)\right| \vert \mathcal {S}_{\mathrm{t},\delta }\right) =\mathbb {E}\left( \left| \psi ^k(\mathbf{Y}_r)\right| \vert \mathbf{X}_r\right) , \end{aligned}$$
  and the function $\Phi _k(\mathbf{x},\psi )=\mathbb {E}\left( \left| \psi ^k(\mathbf{Y})\right| \vert \mathbf{X}=\mathbf{x}\right) $ is continuous in the neighbourhood of $\mathbf{x}$;
(A.7)
For any fixed $\mathbf{x} \in \mathbb {R}^d$,
1. (i)
  $m(\mathbf{x},\psi )$ is twice differentiable on $\mathbb {R}^{d}$, the matrix $m^{(2)}(\mathbf{x},\psi )$ is continuous in a neighbourhood of ${\varvec{\Theta }}$, and $m^{(2)}({\varvec{\Theta }},\psi )$ is nonsingular;
2. (ii)
  $m^{(2)}({\varvec{\Theta }},\psi )$ is bounded on $\mathbb {R}^{d}$.

2.2 Comments on hypotheses

Conditions (A.1) are very common in the nonparametric function estimation literature. Notice that the condition (A.1) is classical in the nonparametric estimation procedures. In particular, by imposing the condition (A.1)(i), the kernel function exploits the smoothness of the density function or the regression function. If we loose the condition that the kernel function $K(\cdot )$ must be a density, the convergence rate could be faster. Indeed, the convergence rate can be made arbitrarily close to the parametric $n^{-1}$ as the order increases. In fact, Chacón et al. [15] showed that the parametric rate $n^{-1}$ can be attained by the use of super-kernels, and that super-kernel density estimators automatically adapt to the unknown degree of smoothness of the density. The main drawback of higher-order kernels in this situation is the negative impact of the kernel may make the estimated density not a density itself. The interested reader is referred to, e.g., Jones et al. [44], Jones and Signorini [43] and Jones [42]. They set some kind of regularity upon the kernels used in our estimates. Notice that the transform of the stationary ergodic process $(\mathbf{X}_\mathrm{t},\mathbf{Y}_\mathrm{t})_{\mathrm{t}\ge 0}$ into the process $(\psi ^2(\mathbf{Y}_\mathrm{t}))_{\mathrm{t}\ge 0}$ is a measurable function. Therefore, making use of Proposition 4.3 of Krengel [47] and then the ergodic Theorem, we obtain

$$\begin{aligned} \underset{T\rightarrow \infty }{\lim } \frac{1}{T} \int _0^T \psi ^2(\mathbf{Y}_\mathrm{t}) dt = \mathbb {E}\left[ \psi ^2(\mathbf{Y}_0)\right] \quad \text{ a.s., } \end{aligned}$$

(2.5)

Condition (A.3)(ii), is a technical condition that simplifies our proofs, precisely, we assume that the density function $f(\cdot )$ is bounded away from zero and infinity on the compact set $\mathfrak C$ in a similar way as in Ziegler [79], Stute [70], Harel and Puri [40], Debbarh [20]. For any set $B\subset \mathbb {R}^d$ and $\epsilon >0$, denote by $B^\epsilon $ the set of all $\mathbf{x}\in \mathbb {R}^d$ such that there exists $\mathbf{y}\in B$ with $\Vert \mathbf{x}-\mathbf{y}\Vert <\epsilon $. One can use that $f(\cdot )$ is continuous and strictly positive on $\mathfrak C^\epsilon $, but this will add much extra complexity to the proofs. Condition (A.4) involves the ergodic nature of the data as given, for instance, in Györfi et al. [36]. Assuming that $\rho ^{\mathcal G_{\mathrm{t}-\delta }}(\cdot )$ and $g^{\mathcal G_{\mathrm{t}-\delta }}(\cdot )$ belong to the space $\mathfrak {C}^0$, at least, of continuous functions, which is a separable Banach space. Moreover, approximating the integral $\displaystyle {\int _0^T \rho ^{\mathcal G_{\mathrm{t}-\delta }}(\mathbf{y}) dt}$ and $\displaystyle {\int _0^T g^{\mathcal G_{\mathrm{t}-\delta }}(\mathbf{x},\mathbf{y}) dt}$ by their Riemann’s sums, it follows that

$$\begin{aligned} \displaystyle { T^{-1}\int _0^T \rho ^{\mathcal G_{\mathrm{t}-\delta }}(\mathbf{y}) dt}&\backsimeq&\displaystyle { n^{-1}\sum _{i=1}^{n} \rho ^{\mathcal G_{\mathrm{t}_i-\delta }}(\mathbf{y}) } \\= & {} \displaystyle {n^{-1} \sum _{j=1}^{n} \rho ^{\mathcal G_{(j-1)\delta }}(\mathbf{y}) }, \end{aligned}$$

and

$$\begin{aligned} \displaystyle { T^{-1}\int _0^T g^{\mathcal G_{\mathrm{t}-\delta }}(\mathbf{x},\mathbf{y}) dt}&\backsimeq&\displaystyle { n^{-1}\sum _{i=1}^{n} g^{\mathcal G_{\mathrm{t}_i-\delta }}(\mathbf{x},\mathbf{y}) } \\= & {} \displaystyle {n^{-1} \sum _{j=1}^{n} g^{\mathcal G_{(j-1)\delta }}(\mathbf{x},\mathbf{y}) }. \end{aligned}$$

Since the processes $(\mathbf{X}_{\mathrm{T}_j}, \mathbf{Y}_{\mathrm{T}_j})_{j\ge 1}$ and $(\mathbf{Y}_{\mathrm{T}_j})_{j\ge 1}$ are stationary and ergodic (see Proposition 4.3 of Krengel [47]) following Delecroix [21] (see, Lemma 4 and Corollary 1 along with with their proofs), one may prove that the sequences $(\rho ^{\mathcal G_{(j-1)\delta }}(\mathbf{y}))_{j\ge 1}$ and $(g^{\mathcal G_{(j-1)\delta }}(\mathbf{x},\mathbf{y}))_{j\ge 1}$ of conditional densities are stationary and ergodic. Moreover, making use of Beck [2]’s theorem (see, for instance, Györfi et al. [36], Theorem 2.1.1), it follows that

$$\begin{aligned}&\lim _{\mathrm{T}\rightarrow \infty }\sup _{y\in \mathbb {R}}\left| \frac{1}{T} \int _0^T \rho ^{\mathcal G_{\mathrm{t}-\delta }}(\mathbf{y}) dt-\mathbb {E}(\rho ^{\mathcal G_{-\delta }}(\mathbf{y}))\right| \\&\quad =\lim _{\mathrm{T}\rightarrow \infty }\sup _{y\in \mathbb {R}}\left| \frac{1}{T} \int _0^T \rho ^{\mathcal G_{\mathrm{t}-\delta }}(\mathbf{y})dt-\rho (\mathbf{y})\right| = 0, a.s. \end{aligned}$$

and

$$\begin{aligned}&\lim _{\mathrm{T}\rightarrow \infty }\sup _{\mathbf{x}\in \mathbb {R}^d}\left| \frac{1}{T} \int _0^T g^{\mathcal G_{\mathrm{t}-\delta }}(\mathbf{x},\mathbf{y}) dt-\mathbb {E}(g^{\mathcal G_{-\delta }}(\mathbf{x},\mathbf{y}))\right| \\&\quad =\lim _{\mathrm{T}\rightarrow \infty }\sup _{\mathbf{x}\in \mathbb {R}^d}\left| \frac{1}{T} \int _0^T g^{\mathcal G_{\mathrm{t}-\delta }}(\mathbf{x},\mathbf{y})dt-g(\mathbf{x},\mathbf{y})\right| = 0, a.s. \end{aligned}$$

It is then clear that both the conditions (A.4) and (A.5) are satisfied. Condition (A.6)(i) is usual in the literature dealing with the study of ergodic processes. The hypothesis (A.6)(ii) is a regularity condition upon the regression function. For the condition (A.6)(iii), we can refer to the following examples.

Example 2.1

Consider the regression model $Y_t= m(X_t) + \epsilon _t,$ where the random variables $\epsilon _t$’s stand as martingale differences with respect to the $\sigma $-field $\mathcal {S}_{r,\delta }, r\le t\le r+\delta $, generated by $\big \{(X_s,\epsilon _s), (X_t) : 0\le s< r, r\le t\le r + \delta \big \}$. Clearly, we have

$$\begin{aligned} \mathbb E[Y_t|\mathcal {S}_{r,\delta }]= m(X_t), \end{aligned}$$

almost surely.

Example 2.2

Consider the regression model $Y_t= m(X_t) + \sigma (X_t)\epsilon _t$, where the random variables $\epsilon _t$ are centered and independent of the process $(X_t)_{t\ge 0}$. Taking $\mathcal {S}_{r,\delta }$ as the $\sigma $-field generated by $\big \{(X_s): 0\le s\le r\big \}$, it follows, for $t\le r$, that

$$\begin{aligned} \mathbb E[Y_t|\mathcal {S}_{r,\delta }]= \mathbb E[m(X_t)+\sigma (X_t)\epsilon _t|\mathcal {S}_{r,\delta }]=m(X_t)+\sigma (X_t)\mathbb E[\epsilon _t]=m(X_t), \end{aligned}$$

almost surely.

Remark 2.3

For notational convenience, we have chosen the same bandwidth sequence for all margins. This assumption can be dropped easily. If one wants to make use of the vector bandwidths (see, in particular, Chapter 12 of Devroye and Lugosi [24]). With obvious changes in the notation, our results and their proofs remain true when $h_\mathrm{T}$ is replaced by a vector bandwidth $\mathbf{h}_\mathrm{T} = (h^{(1)}_\mathrm{T}, \ldots , h^{(1)}_\mathrm{T})$, where $\min h^{(i)}_\mathrm{T} > 0$. In this situation we set $h_\mathrm{T}=\prod _{i=1}^{d} h_\mathrm{T}^{(i)}$, and for any vector $\mathbf{v} = (v_{1} ,\ldots ,v_{d})$ we replace $\mathbf{v}/h$ by $(v_{1}/h_\mathrm{T}^{(1)},\ldots ,v_{d}/h_\mathrm{T}^{(d)})$. For a better understanding we will use real-valued bandwidths throughout the text.

2.3 Theoretical properties

Below, we write $Z {\mathop {=}\limits ^{\mathcal {D}}} \mathcal {N}(\mu , \sigma ^{2} )$ whenever the random variable Z follows a normal law with expectation $\mu $ and variance matrix $\sigma ^{2}$, ${{\mathop {\rightarrow }\limits ^{\mathcal {D}}}}$ denotes the convergence in distribution and ${{\mathop {\rightarrow }\limits ^{\mathbb {P}}}}$ the convergence in probability.

2.3.1 Consistency

The following theorem gives the almost sure consistency result.

Theorem 2.4

Under the hypotheses (A.1)–(A.4) and (A.6), for any n large enough, we have

$$\begin{aligned} \Vert \widehat{{{\varvec{\Theta }} }}_\mathrm{T}-{\varvec{\Theta }} \Vert = O(h_\mathrm{T}^\beta )+O\left( \left( \frac{\log {T}}{Th_\mathrm{T}^{d}}\right) ^{1/2}\right) ,\ \text{ a.s. } \end{aligned}$$

The proof of Theorem 2.4 is postponed to the Sect. 4.

2.3.2 Asymptotic normality

To establish the asymptotic normality of $\widehat{{{\varvec{\Theta }} }}_\mathrm{T}$, observing the statement (2.4), we have to prove that the numerator suitably normalised is asymptotically normally distributed and that the denominator converges in probability to $m_\mathrm{T}^{(2)}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )$. Let G be the $d \times d$ matrix defined by, for $i,j=1,\ldots ,d$,

$$\begin{aligned} G_{i,j}=\int _{\mathbb {R}^{d} }\frac{\partial }{\partial u_{i}}K(\mathbf{u})\frac{\partial }{\partial u_{j}}K(\mathbf{u})d\mathbf{u}. \end{aligned}$$

Let us introduce the matrix $V({\varvec{\Theta }},\psi )$, the $d \times d$ matrix defined by, for $i,j=1,\ldots ,d$

$$\begin{aligned} V_{i,j}({\varvec{\Theta }},\psi )=\frac{\mathbb {E}(|\psi ^2(\mathbf{Y})|\vert \mathbf{X}={\varvec{\Theta }})}{f({\varvec{\Theta }})}G_{i,j} \end{aligned}$$

The main result to be proved here may now be stated precisely as follows.

Theorem 2.5

1.
Under the assumptions (A.1), (A.3)(i)–(ii), (A.4) and (A.6), for any n large enough, we have
$$\begin{aligned} \sqrt{Th_\mathrm{T}^{d+1}} \widehat{m}_\mathrm{T}^{(1)}({\varvec{\Theta }},\psi )\overset{D}{\rightarrow }N(0,{V({\varvec{\Theta }},\psi )}). \end{aligned}$$
2.
If the assumptions (A.1)–(A.5), (A.6)(i) and (A.7) are fulfilled, we have, as $T\rightarrow \infty $, $\widehat{m}_\mathrm{T}^{(2)}(\cdot ,\psi )$ converges uniformly to $m^{(2)}(\cdot ,\psi )$ on the compact set $\mathfrak {C}$. Then, for any n large enough, we have
$$\begin{aligned} \sqrt{Th_\mathrm{T}^{d+1}} \left( \widehat{{{\varvec{\Theta }} }}_\mathrm{T}-{\varvec{\Theta }}\right) \overset{D}{\rightarrow }N(0,[m^{(2)}({\varvec{\Theta }},\psi )]^{-1}V({\varvec{\Theta }},\psi )[m^{(2)}({\varvec{\Theta }},\psi )]^{-1}). \end{aligned}$$
(2.6)

The proof of Theorem 2.5 is postponed to the Sect. 4.

2.4 Confidence set

The asymptotic variance in the central limit theorem depends on the unknown functions, which should be estimated in practice. Let us introduce the matrix $\widehat{V}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )$ an estimate of $V({\varvec{\Theta }},\psi )$, that is a $d \times d$ matrix defined for $i,j=1,\ldots ,n$, by

$$\begin{aligned} \widehat{V}_{i,j}({\varvec{\Theta }},\psi )=\frac{ \widehat{m}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi ^{2})}{f_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T})}G_{i,j}. \end{aligned}$$

The asymptotic variance is estimated by

$$\begin{aligned}{}[\widehat{m}^{(2)}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )]^{-1}\widehat{V}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )[\widehat{m}^{(2)}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )]^{-1}. \end{aligned}$$

Furthermore, from (2.6), the approximate confidence region of ${\varvec{\Theta }}$ can be obtained as

$$\begin{aligned} {\varvec{\Theta }}\in \left[ \widehat{{{\varvec{\Theta }} }}_\mathrm{T} \pm c_\alpha \frac{\left[ [\widehat{m}^{(2)}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )]^{-1}\widehat{V}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )[\widehat{m}^{(2)}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )]^{-1}\right] ^{1/2}}{\sqrt{Th_T^{d+1}}}\right] , \end{aligned}$$

where $c_\alpha $, denotes the $(1-\alpha )-$quantile of the multivariate normal distribution. Note that $c_\alpha $ is not unique since ${\varvec{\Theta }}$ is assumed to be a vector. Sinotina and Vogel [69] used a different approach to construct confidence sets derived as suitable neighbourhoods for maximum points of a regression estimator. The approach relies on the concentration-of-measure inequalities for the regression estimators.

Remark 2.6

It can be observed that our proofs constitute a generalisation of those used in the kernel density mode. Hence, one can obtain easily the corresponding results for the mode density estimators as a particular case of our setting. More precisely, one can consider the kernel estimator of the conditional density of $\mathbf{Y}$ given $\mathbf{X}=\mathbf{x}$, defined by

$$\begin{aligned} \widehat{g}_\mathrm{T}(\mathbf{y}\mid \mathbf{x}):=\begin{array}{lcr} \frac{\displaystyle \frac{ 1}{Th_\mathrm{T}^d}\int _0^T \mathbf{K}\left( \frac{\mathbf{y}-\mathbf{Y}_\mathrm{t}}{\breve{h}_\mathrm{T}}\right) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt}{\displaystyle \frac{ 1}{Th_\mathrm{T}^d}\int _0^T K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt},&\text{ for }&\displaystyle \frac{ 1}{Th_\mathrm{T}^d}\int _0^T K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt\ne 0, \end{array} \end{aligned}$$

where $\mathbf{K}(\cdot )$ is a kernel, $\breve{h}_\mathrm{T}$ is a positive sequence of real numbers tending to 0 at a specific rate. We refer to Bouzebda et al. [8] for more details about the framework of functional ergodic discrete time processes.

Remark 2.7

Chen et al. [17] considered that the conditional (or local) mode set at x is defined as

$$\begin{aligned} M(x) = \biggl \{y: \frac{\partial }{\partial y} p(y\mid x)=0, \frac{\partial ^2}{\partial y^2} p(y\mid x)<0 \biggr \}, \end{aligned}$$

(2.7)

where $p(y\mid x) = p(x,y)/f(x)$ is the conditional density of Y given $X=x$. As a simplification, the set M(x) can be expressed in terms of the joint density as:

$$\begin{aligned} M(x) = \biggl \{y: \frac{\partial }{\partial y} p(x,y)=0, \frac{\partial ^2}{\partial y^2} p(x,y)<0 \biggr \}. \end{aligned}$$

(2.8)

At each x, the local mode set M(x) may consist of several points, and so M(x) is in general a multivalued function. Under appropriate conditions, as we will show, these modes change smoothly as x changes. Thus, local modes behave like a collection of surfaces called modal manifolds in Chen et al. [17]. In our setting, we have considered the extension of the work of Ziegler [79] to the multivariate ergodic setting. The approaches are different and the extension of Chen et al. [17] to the ergodic setting is of interest. The proof of such a statement, however, should require a different methodology than that used in the present paper, and we leave this problem open for future research.

Remark 2.8

In continuous time, data are often collected by using a sampling scheme. Several discretisation schemes have been proposed throughout the literature including deterministic and randomised sampling. The interested reader is referred to Masry [56], Prakasa Rao [62, 63], Bosq [7] and Blanke and Pumo [6]. To simplify the idea, we consider the density estimator of $f(\cdot )$ based on $\{\mathbf{X}_{t}: t\in [0,T]\}$ and let $\{\mathbf{X}(t_k): k=1,\ldots ,n\}$ be its sampled discrete sequence. The sampled estimator of the density $f(\cdot )$ is then

$$\begin{aligned} f_{n}(\mathbf{x})=\frac{1}{nh^{d}_{n}}\sum _{i=1}^{n}K\left( \frac{\mathbf{x}-\mathbf{X}_{t_{j}}}{h_{n}}\right) . \end{aligned}$$

As in Masry [56], we only recall two cases of designs: irregular sampling and random sampling.

Deterministic sampling.:

Consider the case where the instants $(t_{k})_{1\le k\le n}$ are deterministic irregularly spaced with

$$\begin{aligned} \inf _{1\le k\le n}|t_{j+1}-t_{j}|=\frac{1}{\tau }, \end{aligned}$$

for some $\tau >0$. For $1\le k \le n$, consider $\mathcal {G}_{k} :=\sigma (X(t_{k}))$ the $\sigma $-field generated by $\{\mathbf{X}_{s} :0\le s\le t_{k}\}$. Obviously, $(\mathcal {G}_{k})_{1\le k \le n}$ in an increasing family of $\sigma $-fields.

Random sampling.:

Assume that the instants $(t_{k})_{1\le k\le n}$ form a sequence of uniform random variables in the interval [0, T] independent of the process $\{\mathbf{X}_{t}: t\in [0,T]\}$. Define

$$\begin{aligned} 0 \le \tau _{1}< \cdots < \tau _{n} \le T \end{aligned}$$

as the associated order statistics. Notice that $(\tau _{k})_{1\le k \le n}$ are the process observation points. Obviously, the spacings between these points are all positive. As a consequence, taking $\mathcal {G}_{k} :=\sigma (X(t_{k}))$ the $\sigma $-field generated by $\{\mathbf{X}_{s} :0\le s\le \tau _{k}\}$, it follows that $(\mathcal {G}_{k})_{1\le k \le n}$ is a sequence of increasing $\sigma $-fields.

We would like here to mention that the penalisation procedure for the choice of the mesh $\delta $ of the observations gives an optimal rate of convergence as demonstrated in Comte and Merlevède [19], we leave this problem open for future research in the framework of ergodic processes.

3 Concluding remarks

In the present paper, we are mainly concerned with the nonparametric regression model, where the regression function $m(\cdot , \psi )$ is given by $m(\mathbf{x},\psi ) = \mathbb {E}(\psi (\mathbf{Y}) \mid \mathbf{X} = \mathbf{x}))$. For a measurable function $\psi : \mathbb {R}^{q} \rightarrow \mathbb {R}$, estimation of the location ${\varvec{\Theta }}$ (mode) of a unique maximum of $m(\cdot , \psi )$ by the location $ \widehat{{\varvec{\Theta }}}_\mathrm{T}$ of a maximum of the Nadaraya–Watson kernel estimator $\widehat{m}_\mathrm{T}(\cdot ,\psi )$ for the curve $m(\cdot , \psi )$ is considered. Within this context, we obtain consistency and asymptotic normality results for $ \widehat{{\varvec{\Theta }}}_\mathrm{T}$ under mild local smoothness assumptions on $m(\cdot , \psi )$ and the design density of $\mathbf{X}$. It is worth noticing that the ergodic framework covers and completes various situations compared to the mixing case and is more convenient to use in practice, in this sense our work extends the already existing research in the literature. We have illustrated how to use our results to construct the confidence set for the mode ${\varvec{\Theta }}$. In a future research one could consider the same estimation problem for stationary and ergodic discrete time processes in the case of censored data. It will be of interest to relax the stationarity to the local stationarity and establish similar results to those presented in this work, which requires a different mathematical methodology than the one used in this document. We leave this problem open for further investigation.

4 Proofs

This section is devoted to the proofs of our results. The previously defined notation continues to be used in what follows.

From the definition of $\widehat{{{\varvec{\Theta }} }}_\mathrm{T}$ in (1.2) and ${\varvec{\Theta }}$, we have

$$\begin{aligned} |m(\widehat{{{\varvec{\Theta }} }}_\mathrm{T} ,\psi )-m({\varvec{\Theta }},\psi ) |\le & {} |\widehat{m}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )- m(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi ) | \nonumber \\&+ |\widehat{m}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T} ,\psi )- m({\varvec{\Theta }},\psi )| \nonumber \\\le & {} \underset{\mathbf{x} \in \mathfrak {C}}{\sup }\left| \widehat{m}_\mathrm{T}(\mathbf{x},\psi )- m(\mathbf{x},\psi )\right| \nonumber \\&r+ \left| \underset{\mathbf{x} \in \mathfrak {C}}{\sup }\ \widehat{m}_\mathrm{T}(\mathbf{x},\psi ) - \underset{\mathbf{x} \in \mathfrak {C}}{\sup }\ m(\mathbf{x},\psi )\right| \nonumber \\\le & {} 2\ \underset{\mathbf{x} \in \mathfrak {C}}{\sup }\left| \widehat{m}_\mathrm{T}(\mathbf{x},\psi )- m(\mathbf{x},\psi )\right| . \end{aligned}$$

(4.1)

Consider the following decomposition

$$\begin{aligned} Q_\mathrm{T}(\mathbf{x})&:= ({\Psi }_\mathrm{T}(\mathbf{x},\psi ) - \bar{\Psi }_\mathrm{T}(\mathbf{x},\psi ))-m(\mathbf{x},\psi )(f_\mathrm{T}(\mathbf{x}) -\bar{f}_\mathrm{T}(\mathbf{x})), \end{aligned}$$

(4.2)

$$\begin{aligned} R_\mathrm{T}(\mathbf{x},\psi )&:= -B_\mathrm{T}(\mathbf{x},\psi )(f_\mathrm{T}(\mathbf{x}) -\bar{f}_\mathrm{T}(\mathbf{x})),\end{aligned}$$

(4.3)

$$\begin{aligned} B_\mathrm{T}(\mathbf{x},\psi )&:=\frac{\bar{\Psi }_\mathrm{T}(\mathbf{x},\psi )}{\bar{f}_\mathrm{T}(\mathbf{x})}-m(\mathbf{x},\psi ) ,\end{aligned}$$

(4.4)

$$\begin{aligned} \widehat{m}_\mathrm{T}(\mathbf{x},\psi )-m(\mathbf{x},\psi )&= B_\mathrm{T}(\mathbf{x},\psi ) + \frac{ Q_\mathrm{T}(\mathbf{x},\psi )+R_\mathrm{T}(\mathbf{x},\psi )}{f(\mathbf{x})}\frac{f(\mathbf{x})}{f_\mathrm{T}(\mathbf{x})}, \end{aligned}$$

(4.5)

where

$$\begin{aligned} \bar{f}_\mathrm{T}(\mathbf{x})= & {} \frac{1}{Th_\mathrm{T}^d} \int _0^T \mathbb {E} \left[ K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{\mathrm{t}-\delta } \right] dt,\\ \Psi _\mathrm{T}(\mathbf{x},\psi )= & {} \frac{1}{Th_\mathrm{T}^{d}} \int _0^T \psi (\mathbf{Y}_\mathrm{t})K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt, \end{aligned}$$

and

$$\begin{aligned} \bar{\Psi }_\mathrm{T}(\mathbf{x},\psi )= \frac{1}{Th_\mathrm{T}^{d}} \int _0^T \mathbb {E} \left[ \psi (\mathbf{Y}_\mathrm{t})K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{\mathrm{t}-\delta } \right] dt. \end{aligned}$$

The following simple lemmas will play an instrumental role in the sequel.

Lemma 4.1

Let $(Z_n)_{n\ge 1} $ be a sequence of real martingale differences with respect to the sequence of $\sigma -$fields $(\mathcal {F}_n= \sigma (Z_1,\ldots ,Z_n))_{n\ge 1}$, where is the $\sigma $-field generated by the random variables $Z_1,\ldots ,Z_n$. Set

$$\begin{aligned} S_n= \sum _{i=1}^{n} Z_i. \end{aligned}$$

For any $p \ge 2$ and any $n\ge 1$, assume that there exist some nonnegative constants C and $d_n$ such that

$$\begin{aligned} \mathbb {E} \left[ Z_n^p | \mathcal {F}_{n-1}\right] \le C^{p-1} p!\ d_n^2, \quad \text{ almost } \text{ sure }. \end{aligned}$$

Then, for any $\epsilon >0$, we have

$$\begin{aligned} \mathbb {P} \left( | S_n| > \epsilon \right) \le 2 \exp \left\{ -\frac{\epsilon ^2}{2(D_n+C\epsilon )}\right\} . \end{aligned}$$

where

$$\begin{aligned} D_n = \sum _{i=1}^n d_i^2. \end{aligned}$$

Lemma 4.2

Let $\Lambda \times \Lambda ^{'}$ be an index set and for each $(\eta ,\eta ^{'}) \in \Lambda \times \Lambda ^{'}$, let $\{ Z_i(\eta ,\eta ^{'}), i \ge 1\}$, be a sequence of a martingale difference such that $\left| Z_i(\eta ,\eta ^{'})\right| \le B$ a.s. then, for all $\epsilon >0$ and all sufficiently large n, we have

$$\begin{aligned} P\left\{ \left| \sum _{i=1}^n Z_i(\eta ,\eta ^{'}) \right| > \epsilon \right\} \le 2 \exp \left\{ -\frac{\epsilon ^2}{2nB^2} \right\} . \end{aligned}$$

The following proposition describes the almost sure consistency of $\widehat{m}_\mathrm{T}(\mathbf{x},\psi )$ with rate.

Proposition 4.3

Under assumptions (A.1)–(A.4) and (A.6), we have

$$\begin{aligned} \underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| \widehat{m}_\mathrm{T}(\mathbf{x},\psi )-m(\mathbf{x},\psi )\right|= & {} O(h_\mathrm{T}^\beta )+O\left( \left( \frac{\log {T}}{Th_\mathrm{T}^{d}}\right) ^{1/2}\right) , \quad \text{ a.s }. \end{aligned}$$

(4.6)

4.1 Proof of Proposition 4.3.

Making use of conditions (A.2) and (A.3), we infer readily that

$$\begin{aligned}&\underset{\mathbf{x} \in \mathfrak {C}}{\sup }| \widehat{m}_\mathrm{T}(\mathbf{x},\psi )-m(\mathbf{x},\psi ) |\nonumber \\&\quad =\underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| B_\mathrm{T}(\mathbf{x},\psi ) + \frac{ Q_\mathrm{T}(\mathbf{x},\psi )+R_\mathrm{T}(\mathbf{x},\psi )}{f(\mathbf{x})} \frac{f(\mathbf{x})}{f_\mathrm{T}(\mathbf{x})}\right| \nonumber \\&\quad \le \underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| B_\mathrm{T}(\mathbf{x},\psi ) \right| +\frac{1}{\lambda }\ \underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| \frac{ Q_\mathrm{T}(\mathbf{x},\psi ) + R_\mathrm{T}(\mathbf{x},\psi ) }{ \frac{f_\mathrm{T}(\mathbf{x})}{f(\mathbf{x})}}\right| . \end{aligned}$$

(4.7)

Lemma 4.4

Didi and Louani [26] Let $(\mathbf{X}_\mathrm{t})_{\mathrm{t}\ge 0}$ be a strictly stationary and ergodic process, under (A.1) and (A.4), we have then

$$\begin{aligned} \underset{\mathbf{x} \in \mathfrak {C}}{\sup }\left| \frac{f_\mathrm{T}(\mathbf{x})}{f(\mathbf{x})}-1\right| =o_{a.s}(1), \quad \text{ as }\quad T\longrightarrow \infty . \end{aligned}$$

(4.8)

4.2 Proof of Lemma 4.4

Notice that we have the following decomposition

$$\begin{aligned} \frac{f_\mathrm{T}(\mathbf{x}) - f(\mathbf{x})}{f(\mathbf{x})}= & {} \frac{f_\mathrm{T}(\mathbf{x}) -\bar{f}_\mathrm{T}(\mathbf{x}) + \bar{f}_\mathrm{T}(\mathbf{x}) - f(\mathbf{x})}{f(\mathbf{x})}\nonumber \\= & {} \frac{1}{f(\mathbf{x})}\left\{ F_{1,T}(\mathbf{x}) + F_{2,T}(\mathbf{x})\right\} , \end{aligned}$$

(4.9)

where

$$\begin{aligned} \bar{f}_\mathrm{T}(\mathbf{x}) = \frac{1}{Th_\mathrm{T}^{d}} \int _0^T \mathbb {E}\left[ K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta } \right] dt . \end{aligned}$$

Two terms to be investigated, we first take a closer look to the the second term $ F_{2,T}(\mathbf{x})$. We have

$$\begin{aligned} F_{2,T}(\mathbf{x})= & {} \frac{1}{Th_\mathrm{T}^{d}} \int _0^T \mathbb {E}\left[ K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta } \right] dt - f(\mathbf{x})\\= & {} \frac{1}{Th_\mathrm{T}^{d}} \int _0^T \int _{\mathbb {R}^d} K\left( \frac{\mathbf{x}-\mathbf{u}}{h_\mathrm{T}}\right) f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{u}) d\mathbf{u} dt - f(\mathbf{x})\\= & {} \frac{1}{T} \int _0^T \int _{\mathbb {R}^d} K(\mathbf{r}) f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x} - h_\mathrm{T}{} \mathbf{r}) d\mathbf{r} dt - f(\mathbf{x}). \end{aligned}$$

Taylor expansion of $f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x} - h_\mathrm{T}{} \mathbf{r})$ in neighbourhood of $\mathbf{x}$ with assumption (A.4)(i), yields

$$\begin{aligned} f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x} - h_\mathrm{T}{} \mathbf{r})= f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x}) + h_\mathrm{T} \nabla f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x}^*), \end{aligned}$$

where $\mathbf{x}^*$ is between $\mathbf{x}$ and $\mathbf{x}-a_\mathrm{T}{} \mathbf{r}$. It follows from assumption (A.4)(i) that

$$\begin{aligned} \left| f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x} - h_\mathrm{T}{} \mathbf{r})- f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x}) \right| \le C h_\mathrm{T}\Vert \mathbf{r}\Vert . \end{aligned}$$

Making use of assumptions (A.1)(iii) and (A.4)(ii), it follows that

$$\begin{aligned} F_{2,T}(\mathbf{x})= & {} C h_\mathrm{T} \int _{\mathbb {R}^d}\Vert \mathbf{r}\Vert K(\mathbf{r}) d\mathbf{r} + \frac{1}{T} \int _0^T f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x}) dt - f(\mathbf{x}) \nonumber \\= & {} o(1),\quad \text{ a.s }. \end{aligned}$$

(4.10)

Now, we will focus on the first term of decomposition (4.9), $G_{1,T}(\mathbf{x})$, it is clear that

$$\begin{aligned} F_{1,T}(\mathbf{x})= & {} \frac{1}{Th_\mathrm{T}^{d}} \int _0^T \left( K\left( \frac{\mathbf{x}- \mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) - \mathbb {E}\left[ K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta } \right] \right) dt\\= & {} \frac{1}{Th_\mathrm{T}^{d}} \sum _{k=1}^n Z_{\mathrm{T},k}(\mathbf{x}), \end{aligned}$$

where

$$\begin{aligned} T=n\delta , T_k=k\delta \end{aligned}$$

and

$$\begin{aligned} Z_{\mathrm{T},k}(\mathbf{x})= \int _{\mathrm{T}_{k-1}}^{T_k} \left( K\left( \frac{\mathbf{x}- \mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) - \mathbb {E}\left[ K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {G}_{\mathrm{t}-\delta } \right] \right) dt. \end{aligned}$$

We observe that the sequence $\{Z_{\mathrm{T},k}(\mathbf{x})\}, k=1,\ldots ,n,$ is a sequence of martingale differences with respect to the $\sigma -$field

$$\begin{aligned} \mathcal {F}_{k-1}= \sigma (X_s: 0\le s< T_{k-1}). \end{aligned}$$

Under assumption (A.1), the kernel $K(\cdot )$ is a compactly supported probability function, then we obtain

$$\begin{aligned} \left| Z_{\mathrm{T},k}(\mathbf{x}) \right|\le & {} \int _{\mathrm{T}_{k-1}}^{T_k} \left| K\left( \frac{\mathbf{x}- \mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) - \mathbb {E}\left[ K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta } \right] \right| dt \\\le & {} 2\delta \underset{\mathbf{x}\in \mathfrak {C}}{\sup } |K(\mathbf{x})|\\ {}= & {} 2 \delta \widetilde{K}, \end{aligned}$$

where

$$\begin{aligned} \widetilde{K}= \underset{\mathbf{x}\in \mathfrak {C}}{\sup } |K_1(\mathbf{x})|. \end{aligned}$$

Now, for any $\epsilon>$, making use of Lemma 4.2 we obtain

$$\begin{aligned} \mathbb {P}\left\{ \left| \sum _{k=1}^n Z_{\mathrm{T},k}(\mathbf{x})\right| > \epsilon (Th_\mathrm{T}^d) \right\}\le & {} 2\exp \left\{ - \frac{\epsilon ^2 (Th_\mathrm{T}^d)^2}{8n \delta ^2 \widetilde{K}^2} \right\} \\= & {} 2\exp \left\{ - \frac{\epsilon ^2 Th_\mathrm{T}^{2d}}{8\delta \widetilde{K}^2} \right\} . \end{aligned}$$

The right-hand side of the last inequality is the general term of a convergent series, hence, for sufficiently large T we conclude by Borel-Cantelli lemma that

$$\begin{aligned} \sum _{n=1}^\infty \mathbb {P}\left\{ \left| \sum _{k=1}^n Z_{\mathrm{T},k}(\mathbf{x})\right| > \epsilon (Th_\mathrm{T}^d) \right\} < \infty , \end{aligned}$$

which means that

$$\begin{aligned} F_{1,T}(\mathbf{x}) = 0, \quad \text{ a.s }. \end{aligned}$$

(4.11)

The proof is achieved by combining the statements (4.10) and (4.11). $\square $

The following lemma gives the rate of convergence of $f_\mathrm{T}(\mathbf{x})$ over a compact set $\mathfrak {C} $.

Lemma 4.5

Didi and Louani [26] Under assumption (A.1), we have

$$\begin{aligned} \underset{\mathbf{x}\in \mathfrak {C}}{\sup }\left| f_\mathrm{T}(\mathbf{x}) -\bar{f}_\mathrm{T}(\mathbf{x})\right| = O\left( \left( \frac{\log T}{Th_\mathrm{T}^d} \right) ^{1/2}\right) , \quad \text{ a.s. } \end{aligned}$$

(4.12)

4.3 Proof of Lemma 4.5

We refer to the Theorem 1 of Didi and Louani [26]. As in the proof of (4.11) in Lemma 4.4 we obtain the result by using Lemma 4.1 instead of Lemma 4.2. $\square $

In order to complete the proof of Proposition 4.3, we will will show Lemma 4.6 and Lemma 4.7 given hereafter.

Lemma 4.6

If hypothesis (A.1)(i), (A.3), (A.4)(i), (A.6)(ii)-(iii) are fulfilled, we have

$$\begin{aligned} \ \underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| \Psi _\mathrm{T}(\mathbf{x},\psi ) -\bar{\Psi }_\mathrm{T}(\mathbf{x},\psi ) \right| =O\left( \left( \frac{\log {T}}{Th_\mathrm{T}^{d}}\right) ^{1/2}\right) , \quad \text{ a.s. } \end{aligned}$$

(4.13)

4.4 Proof of Lemma 4.6

For $k=1,\ldots ,l$, let $\mathbf{x}_k, \in \mathfrak {C}$. Consider a covering of the compact set $\mathfrak {C}$ by a finite number l of spheres $\mathcal {S}_k$ centered upon by $\mathbf{x}_k$, with radius

$$\begin{aligned} r=h_\mathrm{T}^{d+q+1}/T, \end{aligned}$$

we have that

$$\begin{aligned} \mathfrak {C}\subset \bigcup _{k=1}^{l}\mathcal {S}_k. \end{aligned}$$

Then we have

$$\begin{aligned} \ \underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| \Psi _\mathrm{T}(\mathbf{x},\psi ) -\bar{\Psi }_\mathrm{T}(\mathbf{x},\psi ) \right|\le & {} \ \underset{1\le k\le l}{\max }\underset{\mathbf{x} \in \mathcal {S}_k}{\sup } \left| \Psi _\mathrm{T}(\mathbf{x},\psi ) - \Psi _\mathrm{T}(\mathbf{x}_k,\psi ) \right| \\&+ \underset{1\le k\le l}{\max } \left| \Psi _\mathrm{T}(\mathbf{x}_k,\psi ) -\bar{\Psi }_\mathrm{T}(\mathbf{x}_k,\psi ) \right| \\&+ \ \underset{1\le k\le l}{\max } \underset{\mathbf{x} \in \mathcal {S}_k}{\sup } \left| \bar{\Psi }_\mathrm{T}(\mathbf{x}_k,\psi ) -\bar{\Psi }_\mathrm{T}(\mathbf{x},\psi ) \right| \\= & {} \Psi _{1,T}(\mathbf{x},\mathbf{x}_k)+ \Psi _{2,T} (\mathbf{x}_k) + \Psi _{3,T}(\mathbf{x},\mathbf{x}_k). \end{aligned}$$

Making use of the Cauchy-Schwarz inequality together with assumption (A.1)(i), (A.3), (A.6)(iv) and Lemma 4.4, we readily obtain

$$\begin{aligned}&\left| \Psi _\mathrm{T}(\mathbf{x},\psi ) - \Psi _\mathrm{T}(\mathbf{x}_k,\psi )\right| \nonumber \\&\quad \le \frac{1}{Th_\mathrm{T}^{d}f_\mathrm{T}(\mathbf{x})} \int _0^T \left| \psi (\mathbf{Y}_\mathrm{t})\right| \left| K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) - K\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \right| dt \nonumber \\&\quad \le \frac{1}{Th_\mathrm{T}^{d}f_\mathrm{T}(\mathbf{x})} \left( \int _0^T \psi ^2(\mathbf{Y}_\mathrm{t}) dt \right) ^{1/2} \left( \int _0^T \left( K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) - K\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \right) ^{2} dt \right) ^{1/2} \nonumber \\&\quad \le \frac{1}{\sqrt{T}h_\mathrm{T}^{d}f_\mathrm{T}(\mathbf{x})} \left( \frac{1}{T}\int _0^T \psi ^2(\mathbf{Y}_\mathrm{t}) dt \right) ^{1/2} \frac{\sqrt{T}C_K}{h_\mathrm{T}} \Vert \mathbf{x}-\mathbf{x}_k\Vert \nonumber \\&\quad \le \frac{C_K }{h_\mathrm{T}^{d+1}f(\mathbf{x})} \Vert \mathbf{x}-\mathbf{x}_k\Vert \ O_{a.s}\left( \mathbb {E}\left[ \psi ^2\left( \mathbf{Y}_0\right) \right] \right) \nonumber \\&\quad \le \frac{C_K }{h_\mathrm{T}^{d+1}\lambda } \Vert \mathbf{x}-\mathbf{x}_k\Vert \ O_{a.s}\left( \mathbb {E}\left[ \psi ^2\left( \mathbf{Y}_0\right) \right] \right) \nonumber \\&\quad \le \frac{C_K }{h_\mathrm{T}^{d+1}\lambda } \frac{h_\mathrm{T}}{T} \ O_{a.s}\left( \mathbb {E}\left[ \psi ^2\left( \mathbf{Y}_0\right) \right] \right) \nonumber \\&\quad = \frac{C_K }{Th_\mathrm{T}^{d}\lambda } \ O_{a.s}\left( \mathbb {E}\left[ \psi ^2\left( \mathbf{Y}_0\right) \right] \right) . \end{aligned}$$

(4.14)

Considering the right hand side of statement (4.14) together with the fact that $\mathbb {E}\left[ \psi ^2\left( \mathbf{Y}_0\right) \right] < \infty $, we obtain for

$$\begin{aligned} \epsilon _\mathrm{T}=\epsilon _0\left( \log T/Th_\mathrm{T}^{d}\right) ^{1/2}, \end{aligned}$$

that

$$\begin{aligned} \epsilon _\mathrm{T}^{-1} \Psi _{1,T}(\mathbf{x}, \mathbf{x}_k)= O_{a.s}\left( \left( \frac{1}{Th_\mathrm{T}^{d}\log T} \right) ^{1/2}\right) . \end{aligned}$$

(4.15)

Making use of similar arguments as those used for $\Psi _{1,T}(\mathbf{x}, \mathbf{x}_k)$, we infer that

$$\begin{aligned}&\left| \bar{\Psi }_\mathrm{T}( \mathbf{x})-\bar{\Psi }_\mathrm{T}(\mathbf{x}_k)\right| \\&\quad \le \frac{1}{Th_\mathrm{T}^{d}} \int _0^T \mathbb {E}\left[ \left| \psi (\mathbf{Y}_\mathrm{t})\right| \left| K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) - K\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \right| | \mathcal {F}_{\mathrm{t}-\delta } \right] dt \\&\quad \le \frac{C_K}{Th_\mathrm{T}^{d}} \int _0^T \mathbb {E}\left[ \left| \psi (\mathbf{Y}_\mathrm{t})\right| \left\| \frac{\mathbf{x}-\mathbf{x}_k}{h_\mathrm{T}}\right\| | \mathcal {F}_{\mathrm{t}-\delta } \right] dt\\&\quad \le \left\| \frac{\mathbf{x}-\mathbf{x}_k}{h_\mathrm{T}}\right\| \frac{C_K}{Th_\mathrm{T}^{d+1}} \int _0^T \mathbb {E}\left[ \left| \psi (\mathbf{Y}_\mathrm{t})\right| | \mathcal {F}_{\mathrm{t}-\delta } \right] dt\\&\quad \le \frac{h_\mathrm{T}}{T} \frac{C_K}{h_\mathrm{T}^{d+1}} \left( \frac{1}{T}\int _0^T \mathbb {E}\left[ \left| \psi (\mathbf{Y}_\mathrm{t})\right| | \mathcal {F}_{\mathrm{t}-\delta } \right] dt\right) \\&\quad \le \frac{C_K}{Th_\mathrm{T}^{d}} O\left( \mathbb {E}\left[ \left| \psi (\mathbf{Y}_0)\right| \right] \right) . \end{aligned}$$

Using the fact in (2.5), we get

$$\begin{aligned} \underset{T\rightarrow \infty }{\lim } \frac{1}{T}\int _0^T \mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) | \mathcal {F}_{\mathrm{t}-\delta } \right] dt = \mathbb {E}\left[ \psi (\mathbf{Y}_0) \right] . \end{aligned}$$

This implies that we have

$$\begin{aligned} \epsilon _\mathrm{T}^{-1} \Psi _{3,T}(\mathbf{x},\mathbf{x}_k)= & {} O\left( \left( \frac{1}{Th_\mathrm{T}^{d}\log T} \right) ^{1/2}\right) ,\quad \text{ a.s }. \end{aligned}$$

(4.16)

Now we deal with $\Psi _{2,T}(\mathbf{x}, y, s_k)$. Observe that

$$\begin{aligned}&\Psi _{2,T}(\mathbf{x}_k)\\&\quad = \underset{1\le k\le l}{\max }\left| \frac{1}{Th_\mathrm{T}^{d}} \int _0^T \psi (\mathbf{Y}_\mathrm{t})\left( K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) - \mathbb {E}\left[ K\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) | \mathcal {F}_{\mathrm{t}-\delta } \right] \right) dt \right| \\&\quad = \frac{1}{Th_\mathrm{T}^{d}} \underset{1\le k\le l_\mathrm{T}}{\max } \left| \sum _{j=1}^n R_{\mathrm{T},j}(\mathbf{x}_k) \right| , \end{aligned}$$

where

$$\begin{aligned} R_{\mathrm{T},j}(\mathbf{x}_k)= \int _{\mathrm{T}_{j-i}}^{T_j} \left( K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) - \mathbb {E}\left[ K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta } \right] \right) dt, \end{aligned}$$

where

$$\begin{aligned} T=n\delta ~ \text{ and } ~T_j= j\delta . \end{aligned}$$

We observe that the sequence $\big \{ R_{\mathrm{T},j}(\mathbf{x}_k)\big \}_{0\le j \le n}$ is a sequence of martingale differences adapted to the filtration

$$\begin{aligned} \mathcal {F}_{j-1}=\sigma ((X_s,\mathbf{Y}_s): 0\le s< T_{j-1}). \end{aligned}$$

For $p\ge 2$, making use of Jensen and Minkowski’s inequalities, we get

$$\begin{aligned}&\left| \mathbb {E} \left[ R_{\mathrm{T},j}^p(\mathbf{x}_k) \mid \mathcal {F}_{j-2}\right] \right| \nonumber \\&\quad = \left| \mathbb {E} \left[ \left( \int _{\mathrm{T}_{j-i}}^{T_j} \psi (\mathbf{Y}_\mathrm{t}) \left( K\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) - \mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta } \right] \right) dt \right) ^p\mid \mathcal {F}_{j-2}\right] \right| \nonumber \\&\quad \le \int _{\mathrm{T}_{j-i}}^{T_j}\mathbb {E} \left[ \left| K\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) - \mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t})K\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta } \right] \right| ^p \mid \mathcal {F}_{j-2}\right] dt \nonumber \\&\quad \le \int _{\mathrm{T}_{j-i}}^{T_j}\left( \mathbb {E} \left[ \psi ^p(\mathbf{Y}_\mathrm{t})K^p\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{j-2}\right] ^{1/p}\right. \nonumber \\&\qquad \left. -\mathbb {E} \left[ \mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta } \right] ^p \mid \mathcal {F}_{j-2}\right] ^{1/p} \right) ^p dt \nonumber \\&\quad \le \int _{\mathrm{T}_{j-i}}^{T_j}\left( 2\mathbb {E} \left[ \psi ^p(\mathbf{Y}_\mathrm{t})K^p\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{j-2}\right] ^{1/p} \right) ^p dt \nonumber \\&\quad = 2^p \int _{\mathrm{T}_{j-i}}^{T_j}\mathbb {E} \left[ \psi ^p(\mathbf{Y}_\mathrm{t}) K^p\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{j-2}\right] dt, \end{aligned}$$

(4.17)

Furthermore, by assumption (A.6)(iii), we get

$$\begin{aligned}&\mathbb {E} \left[ \left| \psi ^p(\mathbf{Y}_\mathrm{t}) K^p\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \right| \mid \mathcal {F}_{j-2}\right] \\&\quad =\mathbb {E} \left[ \left| \mathbb {E}\left[ \psi ^p(\mathbf{Y}_\mathrm{t}) K^p\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {S}_{\mathrm{t},\delta }\right] \right| \mid \mathcal {F}_{j-2}\right] \\&\quad =\mathbb {E} \left[ \left| \mathbb {E}\left[ \psi ^p(\mathbf{Y}_\mathrm{t}) \mid \mathcal {S}_{\mathrm{t},\delta }\right] K^p\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \right| \mid \mathcal {F}_{j-2}\right] \\&\quad = \mathbb {E} \left[ \left| h_p(\mathbf{X}_\mathrm{t})\right| K^p\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{j-2}\right] \\&\quad \le \mathbb {E} \left[ \left| h_p(\mathbf{X}_\mathrm{t})-h_p(\mathbf{x})\right| K^p\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{j-2}\right] \\&\qquad + \mathbb {E} \left[ \left| h_p(\mathbf{x})\right| K^p\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{j-2}\right] \\&\quad \le \mathbb {E} \left[ K^p\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{j-2}\right] \left( \underset{\Vert \mathbf{x}-\mathbf{u}\Vert \le \lambda h_\mathrm{T}}{\sup }\left| h_p(\mathbf{X}_\mathrm{t})-h_p(\mathbf{x})\right| +\left| h_p(\mathbf{x})\right| \right) \\&\quad \le \eta (\mathbf{x}) \mathbb {E} \left[ K^p\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{j-2}\right] , \end{aligned}$$

where $\eta (\mathbf{x})$ is a constant. We infer from condition (A.4)(i) that

$$\begin{aligned} \mathbb {E} \left[ K^p\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{j-2}\right]= & {} \int _{\mathbb {R}^{d+1}} K^p\left( \frac{\mathbf{x}-\mathbf{v}}{h_\mathrm{T}}\right) f_\mathrm{T}^{\mathcal {F}_{j-2}}(\mathbf{v}) d\mathbf{v} \nonumber \\= & {} h_\mathrm{T}^{d} \int _{\mathbb {R}^{d+1}} K_1^p(\mathbf{w}) f_\mathrm{T}^{\mathcal {F}_{j-2}}(\mathbf{x}-a_\mathrm{T}{} \mathbf{w}) d\mathbf{w} \nonumber \\\le & {} h_\mathrm{T}^{d} \left\| K \right\| ^p. \end{aligned}$$

(4.18)

Notice that the Eq. (4.17) can be rewritten using Eq. (4.18) as follows

$$\begin{aligned} \mathbb {E} \left[ R_{\mathrm{T},j}^p(\mathbf{x}_k) \mid \mathcal {F}_{j-2}\right]\le & {} 2^p C(\mathbf{x}) \delta h_\mathrm{T}^{d} \left\| K_1 \right\| ^p\nonumber \\\le & {} p! C^{p-2} d_j^2, \end{aligned}$$

(4.19)

where $C=2\left\| K \right\| $ and

$$\begin{aligned} d_j^2=2\delta C(\mathbf{x}) h_\mathrm{T}^{d} \Vert K \Vert ^2. \end{aligned}$$

Let

$$\begin{aligned} D_n=\sum _{j=1}^n d_j^2=\sum _{j=1}^n 2\delta h_\mathrm{T}^{d} \Vert K \Vert = O(T h_\mathrm{T}^{d}). \end{aligned}$$

An application of Lemma 4.1 and keeping in mind that $\epsilon _\mathrm{T}=\epsilon _0\left( \log T/Th_\mathrm{T}^{d}\right) ^{1/2} $, we get, for any $\epsilon _0>0$,

$$\begin{aligned}&\mathbb {P} \left\{ \underset{1\le k\le l}{\max } \left| \sum _{j=1}^n R_{\mathrm{T},j}(\mathbf{x}_k) \right|> \epsilon _\mathrm{T} (Th_\mathrm{T}^{d}) \right\} \\&\quad \le \sum _{k=1}^{l} \mathbb {P} \left\{ \left| \sum _{j=1}^n R_{\mathrm{T},j}(\mathbf{x}_k) \right| > \epsilon _\mathrm{T} (Th_\mathrm{T}^{d}) \right\} \\&\quad \le 2l \exp \left\{ - \frac{\epsilon _\mathrm{T}^2 (Th_\mathrm{T}^{d})^2}{2(D_n+C(Th_\mathrm{T}^{d})\epsilon _\mathrm{T})} \right\} \\&\quad \le 2l \exp \left\{ - \frac{\epsilon _0^2 (Th_\mathrm{T}^{d}) \left( 1/Th_\mathrm{T}^{d}\right) }{O(Th_\mathrm{T}^{d})+2C\left( Th_\mathrm{T}^{d}\right) \epsilon _0\left( \frac{\log T}{Th_\mathrm{T}^{d}}\right) ^{1/2}}\right\} \\&\quad \le 2l \exp \left\{ - \frac{\epsilon _0^2 (Th_\mathrm{T}^{d})^2 \log T/Th_\mathrm{T}^{d}}{O(Th_\mathrm{T}^{d})\left( 1+ \epsilon _0\left( \frac{\log T}{Th_\mathrm{T}^{d}}\right) ^{1/2}\right) }\right\} \\&\quad := 2l\exp \left\{ \log {T^{-2C\epsilon _0^2 C_1}} \right\} \\&\quad = 2lT^{-\epsilon _0^2 C_1 }, \end{aligned}$$

where $C_1$ is a positive constant. The right-hand side of the previous inequality, is the general term of convergent series, hence for T large enough, we obtain the following statement via the Borel-Cantelli lemma

$$\begin{aligned} \sum _{n=1}^\infty \mathbb {P} \left\{ \underset{1\le k\le l}{\max } \left| \sum _{j=1}^n R_{\mathrm{T},j}(\mathbf{x}_k) \right| > \epsilon _\mathrm{T} (Th_\mathrm{T}^{d})^{-1} \right\} < \infty . \end{aligned}$$

This, in turn, implies that

$$\begin{aligned} \frac{1}{Th_\mathrm{T}^{d}}\underset{1\le k\le l}{\max } \left| \sum _{j=1}^n R_{\mathrm{T},j}(\mathbf{x}_k) \right| =O\left( \left( \frac{\log T}{Th_\mathrm{T}^{d}}\right) ^{1/2}\right) , \quad \text{ a.s }. \end{aligned}$$

(4.20)

By combining (4.15), (4.16) and (4.20) with Lemma 4.4, we obtain that,

$$\begin{aligned} \underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| \Psi _\mathrm{T}(\mathbf{x},\psi ) -\bar{\Psi }_\mathrm{T}(\mathbf{x},\psi ) \right| =O\left( \left( \frac{\log T}{Th_\mathrm{T}^{d}}\right) ^{1/2}\right) , \quad \text{ a.s }. \end{aligned}$$

(4.21)

Therefore the proof is complete. $\square $

We next evaluate the term $B_\mathrm{T}(\mathbf{x},\psi )$ defined in (4.5).

Lemma 4.7

Under assumptions (A.1) and (A.6)(i)–(ii), we have

$$\begin{aligned} \underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| B_\mathrm{T}(\mathbf{x},\psi )\right| =O\left( h_\mathrm{T}^{\beta }\right) . \end{aligned}$$

(4.22)

4.5 Proof of Lemma 4.7

First, we will use the notation

$$\begin{aligned} K_{h_\mathrm{T}}(\cdot )= \frac{1}{h_\mathrm{T}^d} K\left( \frac{\cdot }{h_\mathrm{T}}\right) . \end{aligned}$$

We let

$$\begin{aligned} \underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| B_\mathrm{T}(\mathbf{x},\psi )\right| =\underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| \frac{B_\mathrm{T}^\star (\mathbf{x},\psi )}{\bar{f}_\mathrm{T}(\mathbf{x})}\right| . \end{aligned}$$

Observe that assumption (A.6)(i) implies that

$$\begin{aligned} B_\mathrm{T}^\star (\mathbf{x},\psi )= & {} \bar{\Psi }_\mathrm{T}(\mathbf{x},\psi )- \bar{f}_\mathrm{T}(\mathbf{x}) m(\mathbf{x},\psi )\\= & {} \frac{1}{T}\int _0^T \mathbb {E}\left[ \left( \psi (\mathbf{Y}_\mathrm{t})-m(\mathbf{x},\psi ) \right) K_{h_\mathrm{T}}(\mathbf{x}-\mathbf{X}_\mathrm{t}) \mid \mathcal {F}_{\mathrm{t}-\delta }\right] dt \\= & {} \frac{1}{T}\int _0^T \mathbb {E}\left[ K_{h_\mathrm{T}}(\mathbf{x}-\mathbf{X}_\mathrm{t}) \mathbb {E}\left[ \left( \psi (\mathbf{Y}_\mathrm{t})-m(\mathbf{x},\psi ) \right) \mid \mathcal {S}_{\mathrm{T}-\delta ,\delta } \right] \mid \mathcal {F}_{\mathrm{t}-\delta }\right] dt \\= & {} \frac{1}{T}\int _0^T \mathbb {E}\left[ K_{h_\mathrm{T}}(\mathbf{x}-\mathbf{X}_\mathrm{t}) \left( \mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t})\mid \mathbf{X}_\mathrm{t} \right] - m(\mathbf{x},\psi ) \right) \mid \mathcal {F}_{\mathrm{t}-\delta }\right] dt, \\= & {} \frac{1}{T}\int _0^T \mathbb {E}\left[ K_{h_\mathrm{T}}(\mathbf{x}-\mathbf{X}_\mathrm{t}) \left( m(\mathbf{X}_\mathrm{t},\psi )- m(\mathbf{x},\psi ) \right) \mid \mathcal {F}_{\mathrm{t}-\delta }\right] dt. \end{aligned}$$

Under assumption (A.6)(ii), we have

$$\begin{aligned} |B_\mathrm{T}^\star (\mathbf{x},\psi )|\le & {} \underset{\Vert \mathbf{u}-\mathbf{x}\Vert \le h_\mathrm{T} \lambda }{\sup } \left| m(\mathbf{X}_\mathrm{t},\psi )- m(\mathbf{x},\psi ) \right| \frac{1}{T}\int _0^T \mathbb {E}\left[ K_{h_\mathrm{T}}(\mathbf{x}-\mathbf{X}_\mathrm{t}) \mid \mathcal {F}_{\mathrm{t}-\delta }\right] dt\\\le & {} C_\psi \lambda ^\beta h_\mathrm{T}^\beta \frac{1}{Th_\mathrm{T}^d}\int _0^T \mathbb {E}\left[ K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{\mathrm{t}-\delta }\right] dt \\= & {} C_\psi \lambda ^\beta h_\mathrm{T}^\beta \bar{f}_\mathrm{T}(\mathbf{x}). \end{aligned}$$

We obtain that

$$\begin{aligned} \underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| B_\mathrm{T}(\mathbf{x},\psi )\right|= & {} O\left( h_\mathrm{T}^\beta \right) ,\quad \text{ a.s. } \end{aligned}$$

(4.23)

The proof of the lemma is therefore completed. $\square $

Recalling (4.21), the proof of Theorem 4.3 is completed by combining Lemmas 4.4, 4.5 and 4.7. $\square $

In the following lemma, we give the almost sure convergence of $\widehat{{{\varvec{\Theta }} }}_\mathrm{T}$.

Lemma 4.8

Under the hypotheses of Theorem (2.4), we have, as $T\rightarrow \infty $,

$$\begin{aligned} \Vert \widehat{{{\varvec{\Theta }} }}_\mathrm{T}-{\varvec{\Theta }}\Vert \overset{a.s}{\longrightarrow } 0. \end{aligned}$$

4.6 Proof of Lemma 4.8

The uniqueness hypothesis of the conditional mode of the regression gives

$$\begin{aligned} \forall \epsilon>0, \exists \eta (\epsilon ) >0; \forall \xi : \Vert {\varvec{\Theta }}- \xi \Vert \ge \epsilon \Rightarrow \left| \widehat{m}_\mathrm{T}({\varvec{\Theta }},\psi )-m(\xi ,\psi ) \right| \ge \eta (\epsilon ) . \end{aligned}$$

(4.24)

Combining conditions (4.24) and (4.1), we obtain, for any fixed $\mathbf{x} \in \mathfrak {C}$ all $\epsilon >0$, that there exists a $\xi >0$ such that

$$\begin{aligned} \mathbb {P}\left\{ \Vert {\varvec{\Theta }}_\mathrm{T}-{\varvec{\Theta }} \Vert \ge \epsilon \right\} \le \mathbb {P}\left\{ \underset{\mathbf{x}\in \mathfrak {C}}{ \sup } |m_\mathrm{T}(\mathbf{x},\psi )- m(\mathbf{x},\psi )|\ge \xi \right\} . \end{aligned}$$

(4.25)

Which gives the desired result provided that the right-hand side of Eq. (4.25) converges almost surely to zero. The proof is therefore completed by using Proposition 4.3. $\square $

The following lemma gives the uniform convergence of $\widehat{m}^{(2)}_\mathrm{T}(\mathbf{x},\psi )$ over the compact set $\mathfrak {C}$. To simplify our reasoning, from now on, all our will be given in the univariate setting. The extension to the multivariate setting follows easily.

Lemma 4.9

If assumptions (A.1)(ii), (A.3), (A.4)(i), (A.5), (A.6)(i) and (A.7)(i) are fulfilled, we have, as $T\rightarrow \infty $,

$$\begin{aligned} \underset{\mathbf{x}\in \mathfrak {C}}{\sup }\left\| \widehat{m}^{(2)}_\mathrm{T}(\mathbf{x},\psi )-m^{(2)}(\mathbf{x},\psi )\right\| \longrightarrow 0,\quad \text{ almost } \text{ surely }. \end{aligned}$$

(4.26)

4.7 Proof of Lemma 4.9

We first observe that we have

$$\begin{aligned} \widehat{m}^{(2)}_\mathrm{T}(\mathbf{x},\psi )= & {} \left( \frac{\widehat{M}_\mathrm{T}(\mathbf{x},\psi )}{f(\mathbf{x})}\right) ^{(2)}\\= & {} \frac{\left( \widehat{M}_\mathrm{T}^{(1)}(\mathbf{x},\psi )f(\mathbf{x})-f^{(1)}(\mathbf{x})\widehat{M}_\mathrm{T}(\mathbf{x},\psi )\right) ^{(1)}}{f^2(\mathbf{x})}\\&-\frac{2f^{(1)} (\mathbf{x})\left( \widehat{M}_\mathrm{T}^{(1)}(\mathbf{x},\psi )f(\mathbf{x})-f^{(1)}(\mathbf{x})\widehat{M}_\mathrm{T}(\mathbf{x},\psi )\right) }{f^3(\mathbf{x})} \\= & {} \frac{\widehat{M}_\mathrm{T}^{(2)}(\mathbf{x},\psi )}{f(\mathbf{x})}-\frac{2f^{(1)}(\mathbf{x})\widehat{M}_\mathrm{T}^{(1)}(\mathbf{x},\psi )}{f^2(\mathbf{x})}\\&+\frac{\widehat{M}_\mathrm{T}(\mathbf{x},\psi )\left( 2(f^{(1)}(\mathbf{x}))^2f(\mathbf{x})-f^{(2)}(\mathbf{x})\right) }{f^3(\mathbf{x})}\\= & {} \frac{1}{f(\mathbf{x})}\frac{1}{Th^{d+2}_\mathrm{T}} \int _0^T \psi (\mathbf{Y}_\mathrm{t}) K^{(2)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt\\&-\frac{2f^{(1)} (\mathbf{x})}{f^2(\mathbf{x})}\frac{1}{Th^{d+1}_\mathrm{T}} \int _0^T \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt\\&+\left( \frac{2(f^{(1)}(\mathbf{x}))^2}{f^3(\mathbf{x})}-\frac{f^{(2)}(\mathbf{x})}{f^2(\mathbf{x})}\right) \frac{1}{Th^{d}_\mathrm{T}} \int _0^T \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt. \end{aligned}$$

Let us define

$$\begin{aligned} \widetilde{m}^{(2)}_\mathrm{T}(\mathbf{x},\psi )= & {} \frac{1}{f(\mathbf{x})}\frac{1}{Th^{d+2}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(2)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\\&-\frac{2f^{(1)} (\mathbf{x})}{f^2(\mathbf{x})}\frac{1}{Th^{d+1}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\\&+\left( \frac{2(f^{(1)}(\mathbf{x}))^2f(\mathbf{x})-f^{(2)}(\mathbf{x})}{f^3(\mathbf{x})}\right) \frac{1}{Th^{d}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt. \end{aligned}$$

Consider the following decomposition

$$\begin{aligned} \widehat{m}^{(2)}_\mathrm{T}(\mathbf{x},\psi )-m^{(2)}(\mathbf{x},\psi )= & {} \widehat{m}^{(2)}_\mathrm{T}(\mathbf{x},\psi )-\widetilde{m}^{(2)}_\mathrm{T}(\mathbf{x},\psi )\nonumber \\&~+\widetilde{m}^{(2)}_\mathrm{T}(\mathbf{x},\psi )-m^{(2)}(\mathbf{x},\psi )\nonumber \\= & {} A_{\mathrm{T},1}(\mathbf{x},\psi )+A_{\mathrm{T},2}(\mathbf{x},\psi ). \end{aligned}$$

(4.27)

To achieve the asymptotic uniform convergence over the compact set $\mathfrak {C}$ of the term $A_{\mathrm{T},1}(\mathbf{x},\psi )$ in the decomposition (4.27), we have to prove that

$$\begin{aligned}&\underset{\mathbf{x}\in \mathfrak {C}}{\sup } \left\| \frac{1}{Th^{d+2}_\mathrm{T}}\left( \int _0^T \psi (\mathbf{Y}_\mathrm{t}) K^{(2)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt\right. \right. \nonumber \\&\quad \left. \left. - \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(2)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\right) \right\| = o_{a.s}(1), \end{aligned}$$

(4.28)

$$\begin{aligned}&\underset{\mathbf{x}\in \mathfrak {C}}{\sup } \left\| \frac{1}{Th^{d+1}_\mathrm{T}}\left( \int _0^T \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt \right. \right. \nonumber \\&\quad \left. \left. -\int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\right) \right\| =o_{a.s}(1), \end{aligned}$$

(4.29)

$$\begin{aligned}&\underset{\mathbf{x}\in \mathfrak {C}}{\sup } \left\| \frac{1}{Th^{d}_\mathrm{T}}\left( \int _0^T \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt\right. \right. \nonumber \\&\quad \left. \left. - \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}\quad -\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\right) \right\| =o_{a.s}(1), \end{aligned}$$

(4.30)

Using a simple integration par parts and Lemma 4.6, we obtain proof of (4.28)–(4.30), and combining Assumptions (A.1), (A.3), (A.4), (A.6)(i)–(ii) and statements (4.28)–(4.30) we obtain

$$\begin{aligned} \underset{\mathbf{x}\in \mathfrak {C}}{\sup } \Vert A_{\mathrm{T},1}(\mathbf{x},\psi )\Vert = o_{a.s}(1). \end{aligned}$$

(4.31)

Remark that

$$\begin{aligned} m^{(2)}(\mathbf{x},\psi )= & {} \frac{M^{(2)}(\mathbf{x},\psi )}{f(\mathbf{x})}-\frac{2f^{(2)}(\mathbf{x})M^{(1)}_\mathrm{T}(\mathbf{x},\psi )}{f^2(\mathbf{x})}\nonumber \\&+\frac{\left( 2(f^{(1)}(\mathbf{x}))^2f(\mathbf{x})-f^{(2)}(\mathbf{x})\right) M(\mathbf{x},\psi )}{f^3(\mathbf{x})}. \end{aligned}$$

(4.32)

We now treat the second term $A_{\mathrm{T},2}(\mathbf{x},\psi )$ in (4.27). We have

$$\begin{aligned}&A_{\mathrm{T},2}(\mathbf{x},\psi )\\&\quad = \frac{1}{f(\mathbf{x})}\frac{1}{Th^{d+2}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(2)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\\&\qquad -\frac{2f^{(1)} (\mathbf{x})}{f^2(\mathbf{x})}\frac{1}{Th^{d+1}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\\&\qquad +\left( \frac{2(f^{(1)}(\mathbf{x}))^2f(\mathbf{x})-f^{(2)}(\mathbf{x})}{f^3(\mathbf{x})}\right) \frac{1}{Th^{d}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\\&\qquad -\frac{M^{(2)}(\mathbf{x},\psi )}{f(\mathbf{x})}+\frac{f^{(2)}(\mathbf{x})M^{(1)}_\mathrm{T}(\mathbf{x},\psi )}{f^2(\mathbf{x})}-\frac{\left( 2(f^{(1)}(\mathbf{x}))^2f(\mathbf{x})-f^{(2)}(\mathbf{x})\right) M(\mathbf{x},\psi )}{f^3(\mathbf{x})}\\&\quad = \frac{1}{f(\mathbf{x})}\left( \frac{1}{Th^{d+2}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(2)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt-M^{(2)}(\mathbf{x},\psi )\right) \\&\qquad -\frac{2f^{(1)} (\mathbf{x})}{f^2(\mathbf{x})}\left( \frac{1}{Th^{d+1}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt-M^{(1)}_\mathrm{T}(\mathbf{x},\psi )\right) \\&\qquad + \left( \frac{2(f^{(1)}(\mathbf{x}))^2f(\mathbf{x})-f^{(2)}(\mathbf{x})}{f^3(\mathbf{x})}\right) \left( \frac{1}{Th^{d}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt -M(\mathbf{x},\psi )\right) . \end{aligned}$$

To achieve the asymptotic uniform convergence over the compact set $\mathfrak {C}$ of the term $A_{\mathrm{T},2}(\mathbf{x},\psi )$, we have to show the following statements

$$\begin{aligned} \underset{\mathbf{x}\in \mathfrak {C}}{\sup } \left| \frac{1}{Th^{d}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt -M(\mathbf{x},\psi )\right|= & {} o_{a.s}(1), \nonumber \\ \end{aligned}$$

(4.33)

$$\begin{aligned} \underset{\mathbf{x}\in \mathfrak {C}}{\sup } \left\| \frac{1}{Th^{d+1}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt-M^{(1)}_\mathrm{T}(\mathbf{x},\psi )\right\|= & {} o_{a.s}(1), \nonumber \\ \end{aligned}$$

(4.34)

$$\begin{aligned} \underset{\mathbf{x}\in \mathfrak {C}}{\sup } \left\| \frac{1}{Th^{d+2}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(2)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt-M^{(2)}(\mathbf{x},\psi ) \right\|= & {} o_{a.s}(1).\nonumber \\ \end{aligned}$$

(4.35)

Observe that statement (4.33) may be rewritten as follows

$$\begin{aligned}&\underset{\mathbf{x}\in \mathfrak {C}}{\sup } \left| \frac{1}{Th^{d}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt -M(\mathbf{x},\psi )\right| \\&\quad = \underset{\mathbf{x}\in \mathfrak {C}}{\sup } \left| \frac{1}{Th^{d}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt -m(\mathbf{x},\psi )f(\mathbf{x})\right| . \end{aligned}$$

The desired result can be obtained in similar way to the statement (4.23). We have

$$\begin{aligned} M^{(1)}(\mathbf{x},\psi )= & {} \left( m(\mathbf{x},\psi )f(\mathbf{x})\right) ^{(1)} \nonumber \\= & {} m^{(1)}(\mathbf{x},\psi )f(\mathbf{x})+m(\mathbf{x},\psi )f^{(1)}(\mathbf{x}). \end{aligned}$$

(4.36)

On the other hand, under assumption (A.6)(i), we have

$$\begin{aligned}&\frac{1}{Th^{d+1}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\\&\quad =\frac{1}{Th^{d+1}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \big | \mathcal {S}_{\mathrm{t},\delta }\right] \big | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\\&\quad =\frac{1}{Th^{d+1}_\mathrm{T}} \int _0^T\mathbb {E}\left[ m(\mathbf{X}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \big | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\\&\quad = \frac{1}{Th^{d+1}_\mathrm{T}} \int _0^T\int _{\mathbb {R}^d}m(\mathbf{u}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{u}}{h_\mathrm{T}}\right) f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{u})d\mathbf{u}dt. \end{aligned}$$

By integrating by parts we infer that

$$\begin{aligned} U(\mathbf{u})=m(\mathbf{u})f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{u})\rightarrow & {} U^{(1)}(\mathbf{u})=m^{(1)}(\mathbf{u})f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{u})+m(\mathbf{u})\left( f^{ \mathcal {F}_{\mathrm{t}-\delta }}\right) ^{(1)}(\mathbf{u}),\\ V^{(1)}(\mathbf{u})=\frac{1}{h_\mathrm{T}}K^{(1)}\left( \frac{\mathbf{x}-\mathbf{u}}{h_\mathrm{T}}\right)\rightarrow & {} V(\mathbf{u})=-K\left( \frac{\mathbf{x}-\mathbf{u}}{h_\mathrm{T}}\right) . \end{aligned}$$

By integrating by parts and the change of variable $\mathbf{y} =\frac{\mathbf{x}-\mathbf{u}}{h_\mathrm{T}}$ combined with Taylor expansions of order one, under assumptions (A.4)(i), (A.5) and (A.7)(i), we readily obtain

$$\begin{aligned}&\frac{1}{Th^{d}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\nonumber \\&\quad = \frac{1}{Th^{d}_\mathrm{T}} \int _0^T\left( \left[ m(\mathbf{u}) K\left( \frac{\mathbf{x}-\mathbf{u}}{h_\mathrm{T}}\right) f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{u})\right] _{\mathbb {R}^d} \right. \nonumber \\&\qquad \left. + \int _{\mathbb {R}^d} K\left( \frac{\mathbf{x}-\mathbf{u}}{h_\mathrm{T}}\right) \left( m^{(1)}(\mathbf{u})f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{u})+m(\mathbf{u})\left( f^{ \mathcal {F}_{\mathrm{t}-\delta }}\right) ^{(1)}(\mathbf{u})\right) d\mathbf{u} \right) dt\nonumber \\&\quad =\frac{1}{Th^{d}_\mathrm{T}} \int _0^T\int _{\mathbb {R}^d} K\left( \frac{\mathbf{x}-\mathbf{u}}{h_\mathrm{T}}\right) \left( m^{(1)}(\mathbf{u})f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{u})+m(\mathbf{u})\left( f^{ \mathcal {F}_{\mathrm{t}-\delta }}\right) ^{(1)}(\mathbf{u})\right) d\mathbf{u} dt\nonumber \\&\quad =\frac{1}{T} \int _0^T\int _{\mathbb {R}^d} K(\mathbf{y}) \left( m^{(1)}(\mathbf{x})+O(h_\mathrm{T})\right) \left( f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x})+O(h_\mathrm{T}) \right) d\mathbf{y} dt \nonumber \\&\qquad + \frac{1}{T} \int _0^T\int _{\mathbb {R}^d} K(\mathbf{y}) \left( m(\mathbf{x})+O(h_\mathrm{T})\right) \left( \left( f^{ \mathcal {F}_{\mathrm{t}-\delta }}\right) ^{(1)}(\mathbf{x})+O(h_\mathrm{T})\right) d\mathbf{y} dt\nonumber \\&\quad =\left( m^{(1)}(\mathbf{x}) \left( \frac{1}{T} \int _0^Tf^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x}) dt \right) \right. \nonumber \\&\qquad \left. +m(\mathbf{x}) \left( \frac{1}{T} \int _0^T \left( f^{ \mathcal {F}_{\mathrm{t}-\delta }}\right) ^{(1)}(\mathbf{x}) dt \right) \right) \int _{\mathbb {R}^d} K(\mathbf{y})d\mathbf{y} +O(h_\mathrm{T})\nonumber \\&\quad = m^{(1)}(\mathbf{x}) \left( \frac{1}{T} \int _0^T f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x}) dt \right) +m(\mathbf{x}) \left( \frac{1}{T} \int _0^T\left( f^{ \mathcal {F}_{\mathrm{t}-\delta }}\right) ^{(1)} (\mathbf{x}) dt \right) +O(h_\mathrm{T}), \end{aligned}$$

(4.37)

where

$$\begin{aligned} g_i^{\mathcal {F}_{i-2} } (\mathbf{x}) = \left( f^{ \mathcal {F}_{\mathrm{t}-\delta }}\right) ^{(1)}(\mathbf{x}) \end{aligned}$$

is a stationary and ergodic process. Therefore, one have (see Krengel [47], Theorem 4.4),

$$\begin{aligned} \underset{T\rightarrow \infty }{\lim } \underset{\mathbf{x} \in \mathbb {R}^d}{\sup } \left| \frac{1}{T} \int _0^T g_i^{\mathcal {F}_{\mathrm{t}-\delta } } (\mathbf{x})dt - \mathbb {E}\left[ g_1^{\mathcal {F}_{-\delta } } (\mathbf{x}) \right] \right| = 0, \end{aligned}$$

(4.38)

where

$$\begin{aligned} \mathbb {E}\left[ g_1^{\mathcal {F}_{-\delta } } (\mathbf{x}) \right] = f^{(1)} (\mathbf{x}). \end{aligned}$$

By combining the statements (4.37) and (4.38) we conclude proof of (4.34). Moreover, statement (4.35) may be proved in the same way as in statement (4.34), keeping in mind that

$$\begin{aligned} M^{(2)}(\mathbf{x},\psi )= & {} \left( m^{(1)}(\mathbf{x},\psi )f(\mathbf{x})+m(\mathbf{x},\psi )f^{(1)}(\mathbf{x})\right) ^{(2)}\\= & {} m^{(2)}(\mathbf{x},\psi )f(\mathbf{x})+2m^{(1)}(\mathbf{x},\psi )f^{(1)}(\mathbf{x})\\&+m(\mathbf{x},\psi )f^{(2)}(\mathbf{x}). \end{aligned}$$

By applying integration by parts twice, we obtain (4.34). Combining statement (4.33), (4.34) and (4.35), yields to

$$\begin{aligned} \underset{\mathbf{x}\in \mathfrak {C}}{\sup } \left| A_{\mathrm{T},2}(\mathbf{x},\psi )\right| = o_{a.s}(1). \end{aligned}$$

(4.39)

Statements (4.31) and (4.39) complete the proof of Lemma 4.9. $\square $

4.8 Proof of Theorem 2.4

Under assumption (A.3)(ii) and using Taylor expansion of $m({\varvec{\Theta }}_\mathrm{T},\psi )$ around $\Theta $ we obtain

$$\begin{aligned} m({\varvec{\Theta }}_\mathrm{T},\psi )= m({\varvec{\Theta }},\psi ) + ({\varvec{\Theta }}_\mathrm{T}- {\varvec{\Theta }}) m^{(2)}({\varvec{\Theta }}_\mathrm{T}^\star ,\psi )({\varvec{\Theta }}_\mathrm{T}- {\varvec{\Theta }}), \end{aligned}$$

(4.40)

where ${\varvec{\Theta }}_\mathrm{T}^\star $ is between ${\varvec{\Theta }}_\mathrm{T}$ and ${\varvec{\Theta }}$, it follows from equations (4.1) and (4.40) that

$$\begin{aligned} \Vert {\varvec{\Theta }}_\mathrm{T}- {\varvec{\Theta }}\Vert ^{2} \left\| m^{(2)}({\varvec{\Theta }}_\mathrm{T}^\star ,\psi ) \right\| = O \left( \underset{\mathbf{x}\in \mathfrak {C}}{\sup }\left| \widehat{m}_\mathrm{T}(\mathbf{x},\psi )-m(\mathbf{x},\psi ) \right| \right) . \end{aligned}$$

Using Lemma 4.8 and condition (A.7)(ii), one obtains

$$\begin{aligned} \underset{T\rightarrow \infty }{\lim } \left\| m^{(2)}({\varvec{\Theta }}_\mathrm{T}^\star ,\psi ) \right\| = \left\| m^{(2)}({\varvec{\Theta }},\psi ) \right\| \ne 0. \end{aligned}$$

Therefore,

$$\begin{aligned} \Vert {\varvec{\Theta }}_\mathrm{T}- {\varvec{\Theta }}\Vert ^2 = O \left( \underset{\mathbf{x}\in \mathfrak {C}}{\sup }\left| \widehat{m}_\mathrm{T}(\mathbf{x},\psi )-m(\mathbf{x},\psi ) \right| \right) , \end{aligned}$$

(4.41)

which is enough, while considering Proposition 4.3, to complete the proof. $\square $

4.9 Proof of Theorem 2.5

By using formula (2.4), we readily obtain

$$\begin{aligned} \sqrt{Th_\mathrm{T}^{d+2}}\ \ \widehat{m}_\mathrm{T}^{(1)}({\varvec{\Theta }},\psi ) = \sqrt{Th_\mathrm{T}^{d+2}}\ \ (\widehat{{{\varvec{\Theta }} }}_\mathrm{T} - {\varvec{\Theta }}) \ \ \widehat{m}_\mathrm{T}^{(2)}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}^\star ,\psi ), \end{aligned}$$

where $\widehat{{{\varvec{\Theta }} }}_\mathrm{T}^\star $ is a random variable taking his values between ${\varvec{\Theta }}$ and $ \widehat{{{\varvec{\Theta }} }}_\mathrm{T}$. From the hypothesis made on $\widehat{{{\varvec{\Theta }} }}_\mathrm{T}$ it results that $\widehat{{{\varvec{\Theta }} }}_\mathrm{T}^\star $ also converges a.s. towards ${\varvec{\Theta }}$. The continuity of function $m^{(2)}(\cdot ,\psi )$ leads

$$\begin{aligned} \underset{T\rightarrow \infty }{\lim } m^{(2)}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}^\star ,\psi ) = m^{(2)}({\varvec{\Theta }},\psi ). \end{aligned}$$

For T large enough, we have almost surely

$$\begin{aligned} \left| \widehat{m}_\mathrm{T}^{(2)}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}^\star ,\psi ) -m^{(2)}({\varvec{\Theta }},\psi ) \right|\le & {} \underset{x\in \mathfrak {C}}{\sup } \left| \widehat{m}_\mathrm{T}^{(2)}(\mathbf{x},\psi ) -m^{(2)}(\mathbf{x},\psi ) \right| \\&+\left| m^{(2)}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}^\star ,\psi ) -m^{(2)}({\varvec{\Theta }},\psi ) \right| . \end{aligned}$$

The uniform convergence in probability $ \widehat{m}_\mathrm{T}^{(2)}(\cdot ,\psi )$ to $m^{(2)}(\cdot ,\psi )$ over $\mathfrak {C}$ implies the convergence of the sequence $ \widehat{m}_\mathrm{T}^{(2)}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}^\star ,\psi )$ in probability to the non-null real $m^{(2)}({\varvec{\Theta }},\psi )$. The conclusion results from the asymptotic normality of $m_\mathrm{T}^{(1)}({\varvec{\Theta }},\psi )$. Since

$$\begin{aligned} \underset{T\rightarrow \infty }{\lim } f_\mathrm{T}(\mathbf{x})=f(\mathbf{x}) \end{aligned}$$

almost surely and uniformly on the set $\mathfrak {C}$, refer for details [26]. Notice that we have

$$\begin{aligned} \widehat{m}_\mathrm{T}^{(1)}(\mathbf{x},\psi )= & {} \left( \frac{M_\mathrm{T}(\mathbf{x},\psi )}{f(\mathbf{x})}\right) ^{(1)}\nonumber \\= & {} \frac{M_\mathrm{T}^{(1)}(\mathbf{x},\psi )f(\mathbf{x})-f^{(1)}(\mathbf{x})M_\mathrm{T}(\mathbf{x},\psi )}{f^2(\mathbf{x})}\nonumber \\= & {} \frac{1}{f^2(\mathbf{x})}\left( \frac{f(\mathbf{x})}{Th_\mathrm{T}^{d+1}} \int _0^T \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt \right. \nonumber \\&\left. -\frac{f^{(1)}(\mathbf{x})}{Th_\mathrm{T}^{d}}\int _0^T \psi (\mathbf{Y}_s) K\left( \frac{\mathbf{x}-\mathbf{X}_s}{h_\mathrm{T}}\right) ds\right) , \end{aligned}$$

(4.42)

where

$$\begin{aligned} \widehat{m}_\mathrm{T}^{(1)}(\mathbf{x},\psi )= & {} \frac{1}{Th_\mathrm{T}^{d+1}f^2(\mathbf{x})}\left[ f(\mathbf{x}) \int _0^T \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt \right. \\&\left. -h_\mathrm{T}f^{(1)}(\mathbf{x})\int _0^T \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt\right] , \end{aligned}$$

and

$$\begin{aligned} \widetilde{m}_\mathrm{T}^{(1)}(\mathbf{x},\psi )= & {} \frac{1}{Th_\mathrm{T}^{d+1}f^2(\mathbf{x})}\left[ f(\mathbf{x}) \sum _{i=1}^n \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{i-2}\right] dt \right. \\&\left. -h_\mathrm{T}f^{(1)}(\mathbf{x})\int _0^T \mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\right] . \end{aligned}$$

We will make use of the following additional notation

$$\begin{aligned} W_{i}(\mathbf{x},\psi )= & {} \frac{(f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x}))}{(Th_\mathrm{T}^{d})^{1/2}f^2(\mathbf{x})}\int _{\mathrm{T}_{i-1}}^{T_i} \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt,\\ \Delta _i(\mathbf{x},\psi )= & {} \left( W_{i}(\mathbf{x},\psi ) - \mathbb {E}\left[ W_{i}(\mathbf{x},\psi )| \mathcal {F}_{i-2 }\right] \right) ,\\ \sigma ^2(\mathbf{x},\psi )= & {} \frac{\Psi _2(\mathbf{x},\psi )}{f(\mathbf{x})}\int _{\mathbb {R}^{d} }\left[ K^{(1)}(\mathbf{u}) \right] ^2d\mathbf{u}, \end{aligned}$$

where

$$\begin{aligned} \Psi _2(\mathbf{x},\psi )=\mathbb {E}(|\psi ^2(\mathbf{Y})|\vert \mathbf{X}=\mathbf{x}). \end{aligned}$$

Observe that

$$\begin{aligned}&(Th_\mathrm{T}^{d+2})^{1/2} \left( \widehat{m}_\mathrm{T}^{(1)}(\mathbf{x},\psi )-\widetilde{m}_\mathrm{T}^{(1)}(\mathbf{x},\psi )\right) \nonumber \\&\quad =\frac{(Th_\mathrm{T}^{d+2})^{1/2}}{Th_\mathrm{T}^{d+1}}\frac{(f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x}))}{f^2(\mathbf{x})} \nonumber \\&\qquad \times \sum _{i=1}^n \int _{\mathrm{T}_{i-1}}^{T_i}\left( \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) -\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{i-2 }\right] \right) dt \nonumber \\&\quad =\sum _{i=1}^n \Delta _i(\mathbf{x},\psi ). \end{aligned}$$

(4.43)

Lemma 4.10, stated below, will play an instrumental role in the proof of Theorem 2.5.

Lemma 4.10

Under assumptions (A.1), (A.3)(i)–(ii), (A.4) and (A.6), as $n \rightarrow \infty $, we have

$$\begin{aligned} \sum _{i=1}^n \Delta _i(\mathbf{x},\psi )=\sum _{i=1}^n\left( W_{i}(\mathbf{x},\psi ) - \mathbb {E}\left[ W_{i}(\mathbf{x},\psi )| \mathcal {F}_{i-2 }\right] \right) \overset{D}{\rightarrow } N(0,\sigma ^2(\mathbf{x},\psi )). \end{aligned}$$

4.10 Proof of Lemma 4.10

It is easily seen that $(W_{i}(\mathbf{x},\psi ))_{1\le i\le n}$ is a sequence of martingale differences with respect to the sequence of $\sigma -$fields $(\mathcal {F}_{i-1})_{1\le i\le n}$. Therefore, we have to check the following two conditions.

(a)
$$\begin{aligned} \sum _{i=1}^n \mathbb E\left[ \Delta _{i}^2(\mathbf{x},\psi )| \mathcal {F}_{ i-2}\right] \overset{\mathbb {P}}{\rightarrow } \sigma ^2( \mathbf{x},\psi ); \end{aligned}$$
(b)
$$\begin{aligned} n \mathbb E\left[ \Delta _{i}^2(\mathbf{x},\psi ) \mathbb {1}_{\{|\Delta _{i}(\mathbf{x},\psi )|> \epsilon \}}\right] = o(1)~~\text{ holds, } \text{ for } \text{ any } ~~\epsilon >0. \end{aligned}$$

These conditions are necessary to establish the asymptotic normality related to discrete time martingale difference sequences (see, for instance, Hall and Heyde [37]).

4.11 Proof of (a)

First, observe that

$$\begin{aligned} \left| \sum _{i=1}^n \mathbb {E} \left[ W_{i}^2(\mathbf{x},\psi ) \big | \mathcal {F}_{i-2} \right] - \sum _{i=1}^n \mathbb {E} \left[ \Delta ^2_{\mathrm{T},i}(\mathbf{x},\psi ) \big | \mathcal {F}_{i-2} \right] \right| = \left| \sum _{i=1}^n \left( \mathbb {E} \left[ W_{i}(\mathbf{x},\psi ) \big | \mathcal {F}_{i-2} \right] \right) ^2\right| . \end{aligned}$$

Note that, for $T_{i-1}\le t\le T_i$, we have $\mathcal {S}_{\mathrm{T}-\delta ,\delta }\subset \mathcal {F}_{i-2}$. Therefore, making use of condition (A.6)(i), we obtain

$$\begin{aligned}&\left| \mathbb {E} \left[ W_{i}(\mathbf{x},\psi )\big | \mathcal {F}_{i-2} \right] \right| \\&\quad = \left| \frac{1}{(Th_\mathrm{T}^{d})^{1/2}}\frac{(f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x}))}{f^2(\mathbf{x})} \mathbb {E} \left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \big | \mathcal {F}_{i-2} \right] dt \right| \\&\quad \le \frac{1}{(Th_\mathrm{T}^{d})^{1/2}}\frac{(f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x}))}{f^2(\mathbf{x})} \int _{\mathrm{T}_{i-1} }^{T_i} \mathbb {E} \left[ \left| \psi (\mathbf{Y}_\mathrm{t}) \right| K^{(1)}\left( \frac{ \mathbf{x} - \mathbf{X}_\mathrm{T} }{h_\mathrm{T}}\right) \big | \mathcal {F}_{i-2} \right] dt\\&\quad \le \frac{\mathcal {M}_\psi }{(Th_\mathrm{T}^{d})^{1/2}}\frac{(f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x}))}{f^2(\mathbf{x})} \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E} \left[ K^{(1)}\left( \frac{ \mathbf{x} - \mathbf{X}_\mathrm{T} }{h_\mathrm{T}}\right) \big | \mathcal {F}_{i-2} \right] dt. \end{aligned}$$

Using Taylor’s formula combined with assumption (A.4), we obtain

$$\begin{aligned}&\left| \sum _{i=1}^n \left( \mathbb {E} \left[ W_{i}(\mathbf{x},\psi )\big | \mathcal {F}_{i-2} \right] \right) ^2\right| \\&\quad \le \left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2\frac{\mathcal {M}_\psi ^2}{Th_\mathrm{T}^{d}} \sum _{i=1}^n\left( \int _{\mathrm{T}_{i-1}}^{T_i} \int _{\mathbb {R}^d} K^{(1)}\left( \frac{\mathbf{x} - \mathbf{y}}{h_\mathrm{T}} \right) f_\mathrm{T}^{\mathcal {F}_{i-2} } (\mathbf{y}) d\mathbf{y} dt\right) ^2\\&\quad =\left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2\frac{\mathcal {M}_\psi ^2h_\mathrm{T}^{d}}{T} \sum _{i=1}^n\left( \int _{\mathrm{T}_{i-1}}^{T_i} \int _{\mathbb {R}^d} K^{(1)}( \mathbf{z}) f_\mathrm{T}^{\mathcal {F}_{i-2} } (\mathbf{x} -h_\mathrm{T}{} \mathbf{z}) d\mathbf{z} dt\right) ^2 \\&\quad = \left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2 \frac{\mathcal {M}_\psi ^2h_\mathrm{T}^{d}}{\delta } \left( \frac{1}{n} \sum _{i=1}^n \left( \int _{\mathrm{T}_{i-1}}^{T_i} f_\mathrm{T}^{\mathcal {F}_{i-2} } (\mathbf{x}) dt \right) ^2+O(h_\mathrm{T}) \right) \\&\quad :=\left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2 \frac{\mathcal {M}_\psi ^2h_\mathrm{T}^{d}}{\delta } \frac{1}{ n} \sum _{i=1}^n \left( g_i^{\mathcal {F}_{i-2}} (\mathbf{x})\right) ^2 + O\left( h_\mathrm{T}^{d} \right) \\&\quad =O\left( h_\mathrm{T}^{d} \right) , \end{aligned}$$

where

$$\begin{aligned} g_i^{\mathcal {F}_{i-2} } (\mathbf{x}) = \left( \int _{\mathrm{T}_{i-1}}^{T_i} f_\mathrm{T}^{\mathcal {F}_{i-2} } (\mathbf{x}) dt \right) ^2, \end{aligned}$$

is a stationary and ergodic process. So the sum $ \frac{1}{n} \sum _{i=1}^n g_i^{\mathcal {F}_{i-2} } (\mathbf{x})$ has a finite limit, (see Krengel [47, Theorem 4.4]), which is

$$\begin{aligned} \mathbb {E}\left[ g_1^{\mathcal {F}_{-\delta } } (\mathbf{x}) \right] =g_1(\mathbf{x}) =\left( \int ^{\delta }_{0} f_\mathrm{T} (\mathbf{x}) dt \right) ^2 = \delta ^2 f^2(\mathbf{x}) . \end{aligned}$$

(4.44)

Moreover, observe that by assumptions (A.3), we have

$$\begin{aligned} \frac{(f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x}))}{f^2(\mathbf{x})}= \frac{1}{f(\mathbf{x})}+O(h_\mathrm{T})= \frac{1}{f(\mathbf{x})}+o(1). \end{aligned}$$

Using Jensen inequality and Assumption (A.6)(iii), we obtain

$$\begin{aligned}&\sum _{i=1}^n \mathbb {E} \left[ W_{i}^2(\mathbf{x},\psi ) \big | \mathcal {F}_{i-2} \right] \\&\quad = \left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2\frac{1}{Th_\mathrm{T}^{d}} \mathbb {E} \left[ \left( \int _{\mathrm{T}_{i-1}}^{T_i} \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) dt \right) ^2\big | \mathcal {F}_{i-2} \right] \\&\quad \le \left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2\frac{1}{Th_\mathrm{T}^{d}} \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E} \left[ \mathbb {E} \left[ \psi ^2(\mathbf{Y}_\mathrm{t}) \left( K^{(1)} \left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right) ^2\big | \mathcal {S}_{\mathrm{t},\delta } \right] \big | \mathcal {F}_{i-2} \right] dt \\&\quad = \left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2 \frac{1}{Th_\mathrm{T}^{d}} \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E} \left[ \Phi _2(\mathbf{X}_\mathrm{t},\psi ) \left( K^{(1)} \left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right) ^2\big | \mathcal {F}_{i-2} \right] dt. \end{aligned}$$

According to Assumption (A.6)(iii), the function $\Phi _2(\cdot ,\psi )$ is continuous in the neighbourhood of $\mathbf{x}$, we have

$$\begin{aligned}&\sum _{i=1}^n \mathbb {E} \left[ W_{i}^2(\mathbf{x},\psi ) \big | \mathcal {F}_{i-2} \right] \\&\quad \le \left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2\frac{1}{Th_\mathrm{T}^{d}} \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E} \left[ |\Phi _2(\mathbf{X}_\mathrm{t},\psi )-\Phi _2(\mathbf{x},\psi )| \left( K^{(1)} \left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right) ^2\big | \mathcal {F}_{i-2} \right] dt\\&\qquad +\left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2\frac{1}{Th_\mathrm{T}^{d}} \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E} \left[ \Phi _2(\mathbf{x},\psi ) \left( K^{(1)} \left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right) ^2\big | \mathcal {F}_{i-2} \right] dt\\&\quad \le \left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2\frac{1}{Th_\mathrm{T}^{d}} \underset{\parallel \mathbf{x}-\mathbf{v}\parallel \le h_\mathrm{T}}{\sup }|\Phi _2(\mathbf{v},\psi )-\Phi _2(\mathbf{x},\psi )|\\&\qquad \times \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E} \left[ \left( K^{(1)} \left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right) ^2\big | \mathcal {F}_{i-2} \right] dt\\&\qquad +\left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2\frac{\Phi _2(\mathbf{x},\psi ) }{Th_\mathrm{T}^{d}} \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E} \left[ \left( K^{(1)} \left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right) ^2\big | \mathcal {F}_{i-2} \right] dt\\&\quad =\frac{ 1}{Th_\mathrm{T}^{d}f^2(\mathbf{x})}\left( \Phi _2(\mathbf{x},\psi )+o(h_\mathrm{T}) \right) \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E} \left[ \left( K^{(1)} \left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right) ^2\big | \mathcal {F}_{i-2} \right] dt. \end{aligned}$$

By a first order Taylor expansion of the function $f_{\mathrm{T},T{i-2}}$, for $\mathbf{x}^*$ in $[\mathbf{x}-h_\mathrm{T}{} \mathbf{v},\mathbf{x}]$, we obtain

$$\begin{aligned}&\sum _{i=1}^n \mathbb {E} \left[ W_{i}^2(\mathbf{x},\psi ) \big | \mathcal {F}_{i-2} \right] \\&\quad \le \frac{\Phi _2(\mathbf{x},\psi )}{Th_\mathrm{T}^{d}f^2(\mathbf{x})} \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E} \left[ \left( K^{(1)} \left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right) ^2\big | \mathcal {F}_{i-2} \right] dt \\&\quad =\frac{\Phi _2(\mathbf{x},\psi )}{Th_\mathrm{T}^{d}f^2(\mathbf{x})} \sum _{i=1}^n \int _{\mathrm{T}_{i-1}}^{T_i} \int _{\mathbb {R}^{d}} \left( K^{(1)}\right) ^2\left( \frac{\mathbf{x} - \mathbf{u}}{h_\mathrm{T}} \right) f_\mathrm{T}^{\mathcal {F}_{\mathrm{T}_{i-2}} } (\mathbf{u}) d\mathbf{u} dt \\&\quad = \frac{\Phi _2(\mathbf{x},\psi )}{Tf^2(\mathbf{x})} \sum _{i=1}^n \int _{\mathrm{T}_{i-1}}^{T_i} \int _{\mathbb {R}^{d} } \left( K^{(1)}\right) ^2(\mathbf{v})f_\mathrm{T}^{\mathcal {F}_{\mathrm{T}_{i-2}} } (\mathbf{x}-h_\mathrm{T}{} \mathbf{v}) d\mathbf{v} dt\\&\quad = \frac{\Phi _2(\mathbf{x},\psi )}{\delta f^2(\mathbf{x})} \left( \frac{1}{n} \sum _{i=1}^n \int _{\mathrm{T}_{i-1}}^{T_i} f_\mathrm{T}^{\mathcal {F}_{\mathrm{T}_{i-2}} } (\mathbf{x}) dt+ O(h_\mathrm{T}) \right) \int _{\mathbb {R}^{d} } \left( K^{(1)}\right) ^2(\mathbf{v}) d\mathbf{v}. \end{aligned}$$

It is clear, whenever $\delta $ is small enough, that the quantities

$$\begin{aligned} \left( \int _{\mathrm{T}_{i-1}}^{T_i} f_\mathrm{T}^{\mathcal {F}_{\mathrm{T}_{i-2}} } (\mathbf{x} ) dt \right) _{i\in \mathbb {N}} \end{aligned}$$

may be approximated by

$$\begin{aligned} \left( \delta f_{\mathrm{T}_{i-1}}^{\mathcal {F}_{\mathrm{T}_{i-2}} } (\mathbf{x} )\right) _{i\in \mathbb {N}}. \end{aligned}$$

Consequently, using the ergodic and stationarity properties of the process $(\mathbf{X}_\mathrm{t})_{\mathrm{T} \ge 0}$, it follows that

$$\begin{aligned} \frac{1}{n} \sum _{j=1}^n \left( \int _{\mathrm{T}_{j-1}}^{T_j} f_\mathrm{T}^{\mathcal {F}_{\mathrm{T}_{i-2}} } (\mathbf{x}) dt \right)= & {} \mathbb E \left( \int _{\mathrm{T}_0}^{T_1} f_\mathrm{T}(\mathbf{x}) dt \right) + o(1)\\= & {} \int _0^{\delta } \mathbb E \left( f_\mathrm{T}(\mathbf{x}) \right) dt + o(1)\\= & {} \delta f(\mathbf{x}) + o(1). \end{aligned}$$

It follows that

$$\begin{aligned} \sum _{i=1}^n \mathbb {E} \left[ W_{i}^2(\mathbf{x},\psi ) \big | \mathcal {F}_{i-2} \right]= & {} \frac{\Phi _2(\mathbf{x},\psi )}{ f(\mathbf{x})}\int _{\mathbb {R}^d} \left( K^{(1)}\right) ^2(\mathbf{y}) d\mathbf{y} +O(h_\mathrm{T}). \end{aligned}$$

This implies that

$$\begin{aligned} \sum _{i=1}^n \mathbb {E} \left[ W_{i}^2(\mathbf{x},\psi ) \big | \mathcal {F}_{i-2} \right]= & {} \frac{\Phi _2(\mathbf{x},\psi )}{ f(\mathbf{x})}\int _{\mathbb {R}^d} \left( K^{(1)}\right) ^2(\mathbf{y}) d\mathbf{y},\ \text{ as }\ T\rightarrow \infty . \end{aligned}$$

(4.45)

4.12 Proof of (b)

Using the inequalities of Holder, Markov, Jensen and Minkowski’s inequalities, together with Assumption (A.6)(iii), we obtain for all $\epsilon >0$ and all p and q, such that

$$\begin{aligned} \frac{1}{p}+\frac{1}{q}=1, \end{aligned}$$

that

$$\begin{aligned}&\mathbb E[\Delta _{\mathrm{T},i}^2(\mathbf{x}) \mathbb {1}_{\{|\Delta _{\mathrm{T},i}(\mathbf{x})|> \epsilon \}}]\\&\quad \le ( \mathbb E[\Delta _{\mathrm{T},i}^{2q}(\mathbf{x})])^{1/q} (P\{|\Delta _{\mathrm{T},i}(\mathbf{x})| > \epsilon \})^{1/p} \\&\quad \le \epsilon ^{-2q/p}\mathbb E[|\Delta _{\mathrm{T},i}(\mathbf{x} )|^{2q}]\\&\quad =\frac{ \epsilon ^{-2q/p}}{(Th_\mathrm{T}^d)^qf^{2q}(\mathbf{x})}\mathbb {E}\left[ \int _{\mathrm{T}_{i-1}}^{T_i} \left| \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}- \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) -\mathbb E\left[ \psi (\mathbf{Y}_\mathrm{t})K^{(1)}\left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t} }{h_\mathrm{T}}\right) | \mathcal {F}_{i-2}\right] \right| ^{2q} dt \right] \\&\quad \le \frac{2^{2q} \epsilon ^{-2q/p}}{(Th_\mathrm{T}^d)^qf^{2q} (\mathbf{x})}\int _{\mathrm{T}_{i-1}}^{T_i}\mathbb {E}\left[ \left| \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}- \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right| ^{2q} \right] dt\\&\quad = \frac{ 2^{2q}\epsilon ^{-2q/p}}{(Th_\mathrm{T}^d)^qf^{2q} (\mathbf{x})}\int _{\mathrm{T}_{i-1}}^{T_i}\mathbb {E}\left[ \mathbb {E}\left[ \left| \psi (\mathbf{Y}_\mathrm{t})\right| ^{2q}| \mathcal {S}_{\mathrm{t},\delta }\right] \left( K^{(1)}\right) ^{2q}\left( \frac{\mathbf{x}- \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right] dt\\&\quad = \frac{ 2^{2q}\epsilon ^{-2q/p}}{(Th_\mathrm{T}^d)^qf^{2q} (\mathbf{x})}\int _{\mathrm{T}_{i-1}}^{T_i}\mathbb {E}\left[ h_{2q}(\mathbf{X}_\mathrm{t},\psi ) \left( K^{(1)}\right) ^{2q}\left( \frac{\mathbf{x}- \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right] dt\\&\quad \le \frac{ 2^{2q}\epsilon ^{-2q/p}}{(Th_\mathrm{T}^d)^qf^{2q} (\mathbf{x})}\left( \underset{\parallel \mathbf{x}-\mathbf{v}\parallel \le h_\mathrm{T}}{\sup }\left| h_{2q}(\mathbf{v},\psi )-h_{2q}(\mathbf{x},\psi )\right| +h_{2q}(\mathbf{x},\psi ) \right) \\&\qquad \times \int _{\mathrm{T}_{i-1}}^{T_i}\mathbb {E}\left[ \left( K^{(1)}\right) ^{2q}\left( \frac{\mathbf{x}- \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right] dt\\&\quad = \frac{ 2^{2q}\epsilon ^{-2q/p}}{(Th_\mathrm{T}^d)^qf^{2q} (\mathbf{x})}\left( h_{2q}(\mathbf{x},\psi ) +o(1)\right) \int _{\mathrm{T}_{i-1}}^{T_i}\int _{\mathbb {R^d}}\left( K^{(1)}\right) ^{2q}\left( \frac{\mathbf{x}- \mathbf{u}}{h_\mathrm{T}} \right) f(\mathbf{u}) d\mathbf{u} dt\\&\quad =\frac{ 2^{2q}\delta \epsilon ^{-2q/p}}{(Th_\mathrm{T}^d)^qf^{2q} (\mathbf{x})}\left( h_{2q}(\mathbf{x},\psi ) +o(1)\right) \int _{\mathbb {R^d}}\left( K^{(1)}\right) ^{2q}\left( \mathbf{u} \right) f(\mathbf{x}-h_\mathrm{T}{} \mathbf{u}) d\mathbf{u}. \end{aligned}$$

By a first order Taylor’s expansion, we have

$$\begin{aligned}&\mathbb {E}[\Delta _{\mathrm{T},i}^2(\mathbf{x}) \mathbb {1}_{\{|\Delta _{\mathrm{T},i}(\mathbf{x})| > \epsilon \}}]\nonumber \\&\quad = \frac{2^{2q} \epsilon ^{-2q/p}\left\| \left( K^{(1)}\right) ^{2q}\left( \mathbf{v} \right) \right\| _\infty }{(Th_\mathrm{T}^{d})^{(q-1)}f^{2q-1} (\mathbf{x})}\left( h_{2q}(\mathbf{x},\psi ) +o(1)\right) \nonumber \\&\quad =o(1). \end{aligned}$$

(4.46)

Combining statements (4.45) and (4.46), we obtain

$$\begin{aligned} (Th_\mathrm{T}^{d+2})^{1/2} \left( \widehat{m}_\mathrm{T}^{(1)}(\mathbf{x},\psi )- \widetilde{m}_\mathrm{T}^{(1)}(\mathbf{x},\psi )\right) \overset{\mathcal {D}}{\rightarrow } \mathcal {N}\left( 0, \frac{\Phi _2(\mathbf{x},\psi )}{f(\mathbf{x})}\int _{\mathbb {R}^d} \left( K^{(1)}\right) ^2(\mathbf{y}) d\mathbf{y} \right) . \end{aligned}$$

(4.47)

Lemmas 4.9 and 4.10 combined with Theorem 2.4 complete the proof of Theorem 2.5.$\square $

Notes

The definition is provided in the “Appendix”.

References

Andrews, D.W.K.: Nonstrong mixing autoregressive processes. J. Appl. Probab. 21(4), 930–934 (1984)
Article MathSciNet MATH Google Scholar
Beck, A.: On the strong law of large numbers. In: Ergodic Theory (Proc. Internat. Sympos., Tulane Univ., New Orleans, La., 1961), pp. 21–53. Academic Press, New York (1963)
Benrabah, O., Ould Saïd, E., Tatachak, A.: A kernel mode estimate under random left truncation and time series model: asymptotic normality. Stat. Pap. 56(3), 887–910 (2015)
Article MathSciNet MATH Google Scholar
Beran, J.: Statistics for Long-Memory Processes. Volume 61 of Monographs on Statistics and Applied Probability. Chapman and Hall, New York (1994)
MATH Google Scholar
Berlinet, A., Gannoun, A., Matzner-Løber, E.: Normalité asymptotique d’estimateurs convergents du mode conditionnel. Can. J. Stat. 26(2), 365–380 (1998)
Article MATH Google Scholar
Blanke, D., Pumo, B.: Optimal sampling for density estimation in continuous time. J. Time Ser. Anal. 24(1), 1–23 (2003)
Article MathSciNet MATH Google Scholar
Bosq, D.: Nonparametric Statistics for Stochastic Processes. Estimation and Prediction. Volume 110 of Lecture Notes in Statistics. Springer, New York (1998)
Book MATH Google Scholar
Bouzebda, S., Chaouch, M., Laïb, N.: Limiting law results for a class of conditional mode estimates for functional stationary ergodic data. Math. Methods Stat. 25(3), 168–195 (2016)
Article MathSciNet MATH Google Scholar
Bouzebda, S., Didi, S.: Multivariate wavelet density and regression estimators for stationary and ergodic discrete time processes: asymptotic results. Commun. Stat. Theory Methods 46(3), 1367–1406 (2017a)
Article MathSciNet MATH Google Scholar
Bouzebda, S., Didi, S.: Additive regression model for stationary and ergodic continuous time processes. Commun. Stat. Theory Methods 46(5), 2454–2493 (2017b)
Article MathSciNet MATH Google Scholar
Bouzebda, S., Didi, S.: Some results about kernel estimators for function derivatives based on stationary and ergodic continuous time processes with applications. Commun. Stat. Theory Methods (2020). (to appear)
Bouzebda, S., Didi, S., El Hajj, L.: Multivariate wavelet density and regression estimators for stationary and ergodic continuous time processes: asymptotic results. Math. Methods Stat. 24(3), 163–199 (2015)
Article MathSciNet MATH Google Scholar
Bradley, R.C.: Introduction to Strong Mixing Conditions, vol. 1. Kendrick Press, Heber City (2007)
MATH Google Scholar
Chacón, J.E.: The modal age of statistics (2018). arXiv:1807.02789
Chacón, J.E., Montanero, J., Nogales, A.G.: A note on kernel density estimation at a parametric rate. J. Nonparametric Stat. 19(1), 13–21 (2007)
Article MathSciNet MATH Google Scholar
Chen, Y.-C.: Modal regression using kernel density estimation: a review. Wiley Interdiscip. Rev. Comput. Stat. 10(4), e1431 (2018)
Article MathSciNet Google Scholar
Chen, Y.-C., Genovese, C.R., Tibshirani, R.J., Wasserman, L.: Nonparametric modal regression. Ann. Stat. 44(2), 489–514 (2016)
Article MathSciNet MATH Google Scholar
Collomb, G., Härdle, W., Hassani, S.: A note on prediction via estimation of the conditional mode function. J. Stat. Plan. Inference 15(2), 227–236 (1987)
MathSciNet MATH Google Scholar
Comte, F., Merlevède, F.: Adaptive estimation of the stationary density of discrete and continuous time mixing processes. ESAIM Probab. Stat. 6, 211–238 (2002). (New directions in time series analysis (Luminy, 2001))
Article MathSciNet Google Scholar
Debbarh, M.: Normalité asymptotique de l’estimateur par ondelettes des composantes d’un modèle additif de régression. C. R. Math. Acad. Sci. Paris 343(9), 601–606 (2006)
Article MathSciNet MATH Google Scholar
Delecroix, M.: Sur l’estimation et la prévision non-paramétrique des processus ergodiques. Université des sciences de Lille, Flandre-Artois, Doctorat d’État (1987)
Devroye, L.: A Course in Density Estimation. Volume 14 of Progress in Probability and Statistics. Birkhäuser Boston Inc., Boston (1987)
MATH Google Scholar
Devroye, L., Györfi, L.: Nonparametric Density Estimation. Wiley Series in Probability and Mathematical Statistics: Tracts on Probability and Statistics. Wiley, New York (1985). (The${{\mathbf{L}}_{1}}$view)
MATH Google Scholar
Devroye, L., Lugosi, G.: Combinatorial Methods in Density Estimation. Springer Series in Statistics. Springer, New York (2001)
Book MATH Google Scholar
Didi, S.: Quelques propriétés asymptotiques en estimation non paramétrique de fonctionnelles de processus stationnaires en temps continu. Ph.D. thesis. Thèse de doctorat de Statistique, Paris 6 2014 (2014)
Didi, S., Louani, D.: Consistency results for the kernel density estimate on continuous time stationary and dependent data. Stat. Probab. Lett. 83(4), 1262–1270 (2013)
Article MathSciNet MATH Google Scholar
Didi, S., Louani, D.: Asymptotic results for the regression function estimate on continuous time stationary and ergodic data. Stat. Risk Model. 31(2), 129–150 (2014)
Article MathSciNet MATH Google Scholar
Eddy, W.F.: Optimum kernel estimators of the mode. Ann. Stat. 8(4), 870–882 (1980)
Article MathSciNet MATH Google Scholar
Eddy, W.F.: The asymptotic distributions of kernel estimators of the mode. Z. Wahrsch. Verw. Gebiete 59(3), 279–290 (1982)
Article MathSciNet MATH Google Scholar
Eggermont, P.P.B., LaRiccia, V.N.: Maximum Penalized Likelihood Estimation. Density Estimation. Springer Series in Statistics, vol. I. Springer, New York (2001)
Book MATH Google Scholar
Einbeck, J., Tutz, G.: Modelling beyond regression functions: an application of multimodal regression to speed-flow data. J. R. Stat. Soc. Ser. C (Appl. Stat.) 55(4), 461–475 (2006)
Article MathSciNet MATH Google Scholar
Ezzahrioui, M., Ould-Saïd, E.: Asymptotic normality of a nonparametric estimator of the conditional mode function for functional data. J. Nonparametric Stat. 20(1), 3–18 (2008)
Article MathSciNet MATH Google Scholar
Feng, Y., Fan, J., Suykens, J.A.K.: A statistical learning approach to modal regression (2017). arXiv:1702.05960
Ferraty, F., Laksaci, A., Vieu, P.: Functional time series prediction via conditional mode estimation. C. R. Math. Acad. Sci. Paris 340(5), 389–392 (2005)
Article MathSciNet MATH Google Scholar
Grund, B., Hall, P.: On the minimisation of $L^p$ error in mode estimation. Ann. Stat. 23(6), 2264–2284 (1995)
Article MATH MathSciNet Google Scholar
Györfi, L., Härdle, W., Sarda, P., Vieu, P.: Nonparametric Curve Estimation from Time Series. Lecture Notes in Statistics, vol. 60. Springer, Berlin (1989)
Book MATH Google Scholar
Hall, P., Heyde, C.C.: Martingale Limit Theory and Its Application. Probability and Mathematical Statistics. Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York-London (1980)
MATH Google Scholar
Härdle, W.: Applied Nonparametric Regression. Volume 19 of Econometric Society Monographs. Cambridge University Press, Cambridge (1990)
Book MATH Google Scholar
Härdle, W., Marron, J.S.: Optimal bandwidth selection in nonparametric regression function estimation. Ann. Stat. 13(4), 1465–1481 (1985)
Article MathSciNet MATH Google Scholar
Harel, M., Puri, M.L.: Conditional $U$-statistics for dependent random variables. J. Multivariate Anal. 57(1), 84–100 (1996)
Article MathSciNet MATH Google Scholar
Herrmann, E., Ziegler, K.: Rates on consistency for nonparametric estimation of the mode in absence of smoothness assumptions. Stat. Probab. Lett. 68(4), 359–368 (2004)
Article MathSciNet MATH Google Scholar
Jones, M.C.: On higher order kernels. J. Nonparametric Stat. 5(2), 215–221 (1995)
Article MathSciNet MATH Google Scholar
Jones, M.C., Signorini, D.F.: A comparison of higher-order bias kernel density estimators. J. Am. Stat. Assoc. 92(439), 1063–1073 (1997)
Article MathSciNet MATH Google Scholar
Jones, M.C., Linton, O., Nielsen, J.P.: A simple bias reduction method for density estimation. Biometrika 82(2), 327–338 (1995)
Article MathSciNet MATH Google Scholar
Kemp, G.C.R., Santos Silva, J.M.C.: Regression towards the mode. J. Econom. 170(1), 92–101 (2012)
Article MathSciNet MATH Google Scholar
Krebs, J.T.N.: The bootstrap in kernel regression for stationary ergodic data when both response and predictor are functions. J. Multivar. Anal. 173, 620–639 (2019)
Article MathSciNet MATH Google Scholar
Krengel, U.: Ergodic Theorems. Volume 6 of de Gruyter Studies in Mathematics. Walter de Gruyter & Co., Berlin (1985). (With a supplement by Antoine Brunel)
Book MATH Google Scholar
Laib, N., Louani, D.: Nonparametric kernel regression estimation for functional stationary ergodic data: asymptotic properties. J. Multivar. Anal. 101(10), 2266–2281 (2010)
Article MathSciNet MATH Google Scholar
Laib, N., Louani, D.: Rates of strong consistencies of the regression function estimator for functional stationary ergodic data. J. Stat. Plan. Inference 141(1), 359–372 (2011)
Article MathSciNet MATH Google Scholar
Lee, M.-J.: Mode regression. J. Econom. 42(3), 337–349 (1989)
Article MathSciNet MATH Google Scholar
Lee, M.-J.: Quadratic mode regression. J. Econom. 57(1–3), 1–19 (1993)
Article MathSciNet MATH Google Scholar
Leucht, A., Neumann, M.H.: Degenerate $U$- and $V$-statistics under ergodicity: asymptotics, bootstrap and applications in statistics. Ann. Inst. Stat. Math. 65, 349–386 (2013)
Article MathSciNet MATH Google Scholar
Lu, Z.: Analyse des processus longue mémoire stationnaires et non-stationnaires : estimations, applications et prévisions. Ph.D. thesis. Thèse de doctorat de Mathématiques, Mathématiques financières et statistiques appliquées, Cachan, Ecole normale supérieure 2009 (2009)
Maslowski, B., Pospíšil, J.: Ergodicity and parameter estimates for infinite-dimensional fractional Ornstein-Uhlenbeck process. Appl. Math. Optim. 57(3), 401–429 (2008)
Article MathSciNet MATH Google Scholar
Masry, E.: Nonparametric regression estimation for dependent functional data: asymptotic normality. Stoch. Process. Appl. 115(1), 155–177 (2005)
Article MathSciNet MATH Google Scholar
Masry, E.: Probability density estimation from sampled data. IEEE Trans. Inf. Theory 29(5), 696–709 (1983)
Article MathSciNet MATH Google Scholar
Nadaraya, È.A.: Nonparametric Estimation of Probability Densities and Regression Curves. Volume 20 of Mathematics and its Applications (Soviet Series). Kluwer Academic Publishers Group, Dordrecht (1989). (Translated from the Russian by Samuel Kotz)
Book MATH Google Scholar
Neumann, M.H.: Absolute regularity and ergodicity of Poisson count processes. Bernoulli 17(4), 1268–1284 (2011)
Article MathSciNet MATH Google Scholar
Ould-Saïd, E.: A note on ergodic processes prediction via estimation of the conditional mode function. Scand. J. Stat. 24(2), 231–239 (1997)
Article MathSciNet MATH Google Scholar
Ota, H., Kato, K., Hara, S.: Quantile regression approach to conditional mode estimation. Electron. J. Stat. 13(2), 3120–3160 (2019)
Article MathSciNet MATH Google Scholar
Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076 (1962)
Article MathSciNet MATH Google Scholar
Prakasa Rao, B.L.S.: Statistical Inference for Diffusion Type Processes, Volume 8 of Kendall’s Library of Statistics. Edward Arnold, London; Oxford University Press, New York (1999)
Prakasa Rao, B.L.S.: Nonparametric density estimation for stochastic processes from sampled data. Publ. Inst. Statist. Univ. Paris, 35(3), 51–83 (1991) (1990)
Quintela-Del-Río, A., Vieu, P.: A nonparametric conditional mode estimate. J. Nonparametric Stat. 8(3), 253–266 (1997)
Article MathSciNet MATH Google Scholar
Romano, J.P.: Bootstrapping the mode. Ann. Inst. Stat. Math. 40(3), 565–586 (1988a)
Article MathSciNet MATH Google Scholar
Romano, J.P.: On weak convergence and optimality of kernel density estimates of the mode. Ann. Stat. 16(2), 629–647 (1988b)
Article MathSciNet MATH Google Scholar
Samanta, M., Thavaneswaran, A.: Nonparametric estimation of the conditional mode. Commun. Stat. Theory Methods 19(12), 4515–4524 (1990)
Article MathSciNet MATH Google Scholar
Sasaki, H., Ono, Y., Sugiyama, M.: Modal regression via direct log-density derivative estimation. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) Neural Information Processing, pp. 108–116. Springer, Cham (2016)
Chapter Google Scholar
Sinotina, T., Vogel, S.: Universal confidence sets for the mode of a regression function. IMA J. Manag. Math. 23(4), 309–323 (2012)
Article MathSciNet MATH Google Scholar
Stute, W.: Conditional $U$-statistics. Ann. Probab. 19(2), 812–825 (1991)
Article MathSciNet MATH Google Scholar
Tapia, R.A., Thompson, J.R.: Nonparametric Probability Density Estimation. Volume 1 of Johns Hopkins Series in the Mathematical Sciences. Johns Hopkins University Press, Baltimore (1978)
MATH Google Scholar
Tarter, M.E., Lock, M.D.: Model-Free Curve Estimation. Volume 56 of Monographs on Statistics and Applied Probability. Chapman & Hall, New York (1993)
MATH Google Scholar
Wand, M.P., Jones, M.C.: Kernel Smoothing. Volume 60 of Monographs on Statistics and Applied Probability. Chapman and Hall Ltd., London (1995)
Book MATH Google Scholar
Wertz, W.: Statistical Density Estimation: A Survey, Volume 13 of Angewandte Statistik und Ökonometrie [Applied Statistics and Econometrics]. Vandenhoeck & Ruprecht, Göttingen. With German and French Summaries (1978)
Yao, W., Li, L.: A new regression model: modal linear regression. Scand. J. Stat. 41(3), 656–671 (2014)
Article MathSciNet MATH Google Scholar
Youndjé, E.: Estimation non paramétrique de la densité conditionnelle par la méthode du noyau. Ph.D. thesis. Thèse de doctorat de Sciences et techniques communes Rouen 1993 (1993)
Ziegler, K.: On bootstrapping the mode in the nonparametric regression model with random design. Metrika 53(2), 141–170 (2001)
Article MathSciNet MATH Google Scholar
Ziegler, K.: On nonparametric kernel estimation of the mode of the regression function in the random design model. J. Nonparametric Stat. 14(6), 749–774 (2002)
Article MathSciNet MATH Google Scholar
Ziegler, K.: On the asymptotic normality of kernel regression estimators of the mode in the nonparametric random design model. J. Stat. Plan. Inference 115(1), 123–144 (2003)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank the two referees for their constructive suggestions that have led to improve the presentation of this paper. The authors would like to express their appreciation for everybody fighting against the COVID-19 pandemic.

Author information

Authors and Affiliations

Alliance Sorbonne Université, Université de Technologie de Compiègne, L.M.A.C., Compiègne, France
Salim Bouzebda
College of Sciences, Qassim University, PO Box 6688, 51452, Buraydah, Saudi Arabia
Sultana Didi

Authors

Salim Bouzebda
View author publications
You can also search for this author in PubMed Google Scholar
Sultana Didi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

SD, SB: conceptualization, methodology, investigation, writing—original draft, writing—review and Editing. Both authors contributed equally to this work.

Corresponding author

Correspondence to Salim Bouzebda.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

For the sake of clarity, introduce some details defining the ergodic property of continuous time processes. Let $\mathbf{Z}=\{\mathbf{Z}_t\}_{t\ge 0}$ be a continuous time process taking values in some measurable space $(E,\mathcal {\mathbf{Z}})$ on which is defined a probability measure $\mu $. For $\delta \ge 0$, let $\Upsilon ^\delta $ be a $\delta $-shift transformation, i.e., $(\Upsilon ^\delta (x))_s=x_{s+\delta }$. A measurable set $\mathcal {A}$ is $\delta $-invariant if it does not change under $\delta $-shift transformation , i.e., $\Upsilon ^\delta (\mathcal {A})= \mathcal {A}$.

Definition 5.1

($\delta $-ergodicity) A continuous time process $\mathbf{Z}=\{\mathbf{Z}_t\}_{t\ge 0}$ is $\delta $-ergodic if every measurable $\delta $-invariant set related to the process $\mathbf{Z}$ has probability either 1 or 0, in other words, for any $\delta $-invariant set $\mathcal {A}$, $\mu (\mathcal {A})=(\mu (\mathcal {A}))^2$.

This definition means that if we take the process X and slice it into time blocks of length $\delta $ then the new discrete time process $(\mathbf{Z}_0^\delta ,\mathbf{Z}_\delta ^{2\delta },\mathbf{Z}_{2\delta }^{3\delta },\mathbf{Z}_{3\delta }^{4\delta },\ldots )$ is ergodic. For discrete time processes, we refer, for instance, to Krengel [47] for the definition and details on the ergodic property.

Definition 5.2

(Ergodicity) A continuous time process $\mathbf{Z}=\{\mathbf{Z}_t\}_{t\ge 0}$ is ergodic if it is $\delta $-ergodic for every $\delta >0$.

It is well known from the ergodic theorem that, for a measurable function g and a stationary ergodic process $\mathbf{Z}=\{\mathbf{Z}_t\}_{t\ge 0}$, we have

$$\begin{aligned} \lim _{T\rightarrow \infty }\frac{1}{T}\int _0^Tg(\mathbf{Z}_t)dt=\mathbb {E}(g(\mathbf{Z}_0)). \end{aligned}$$

(5.1)

We refer to the book of Krengel [47] for an account of details and results on the ergodic theory.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bouzebda, S., Didi, S. Some asymptotic properties of kernel regression estimators of the mode for stationary and ergodic continuous time processes. Rev Mat Complut 34, 811–852 (2021). https://doi.org/10.1007/s13163-020-00368-6

Download citation

Received: 09 January 2020
Accepted: 27 July 2020
Published: 17 August 2020
Issue Date: September 2021
DOI: https://doi.org/10.1007/s13163-020-00368-6

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Some asymptotic properties of kernel regression estimators of the mode for stationary and ergodic continuous time processes

Abstract

Similar content being viewed by others

Multivariate Gaussian processes: definitions, examples and applications

On some stable linear functional regression estimators based on random projections

Reversed particle filtering for hidden markov models

1 Introduction

2 Main results

2.1 Assumptions

2.2 Comments on hypotheses

Example 2.1

Example 2.2

Remark 2.3

2.3 Theoretical properties

2.3.1 Consistency

Theorem 2.4

2.3.2 Asymptotic normality

Theorem 2.5

2.4 Confidence set

Remark 2.6

Remark 2.7

Remark 2.8

3 Concluding remarks

4 Proofs

Lemma 4.1

Lemma 4.2

Proposition 4.3

4.1 Proof of Proposition 4.3.

Lemma 4.4

4.2 Proof of Lemma 4.4

Lemma 4.5

4.3 Proof of Lemma 4.5

Lemma 4.6

4.4 Proof of Lemma 4.6

Lemma 4.7

4.5 Proof of Lemma 4.7

Lemma 4.8

4.6 Proof of Lemma 4.8

Lemma 4.9

4.7 Proof of Lemma 4.9

4.8 Proof of Theorem 2.4

4.9 Proof of Theorem 2.5

Lemma 4.10

4.10 Proof of Lemma 4.10

4.11 Proof of (a)

4.12 Proof of (b)

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Definition 5.1

Definition 5.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation