1 Introduction

Nonparametric estimation has been the subject of intense investigation for many years and this has led to the development of a large variety of methods. Because of numerous applications and their important role in mathematical statistics, the problem of estimating the density and regression function has been the subject of considerable interest during the last decades. One of the most commonly used classes of estimators is that formed by the so-called kernel-type estimators. For more theoretical aspects along with statistical applications the interested reader is referred to Tapia and Thompson [71], Wertz [74], Devroye and Györfi [23], Devroye [22], Nadaraya [57], Härdle [38], Wand and Jones [73], Eggermont and LaRiccia [30], Devroye and Lugosi [24] and the references therein. Recently, a number of statistical problems has found an unexpected solution being investigated by a “modal point of view”. This investigation includes classical processes such as clustering. This has led to a renewed interest in the estimation and the inference for the mode. The estimation of the conditional mode of an outcome variable given the regressors, is called modal regression. Modal regression is an alternative approach to the usual regression methods for exploring the relationship between a response variable \(\mathbf{Y}\) and a predictor variable \(\mathbf{X}\). Unlike conventional regression, which is based on the conditional mean of \(\mathbf{Y}\) given \(\mathbf{X} = \mathbf{x}\), modal regression estimates conditional modes of \(\mathbf{Y}\) given \(\mathbf{X} = \mathbf{x}\). Modal regression is a more reasonable modelling approach than the usual regression at least in two scenarios. Firstly when the conditional density function is skewed or has a heavy tail. When the conditional density function has skewness, the conditional mean may not provide a good representation for summarising the relations between the response and the covariate. The other scenario is when the conditional density function has multiple local modes. This occurs when the relation of \(\mathbf{X}\) and \(\mathbf{Y}\) contains multiple patterns. The conditional mean may not capture any of these patterns, so it can be a very bad summary; see Chen et al. [17] for an example. This situation has already been pointed out in Tarter and Lock [72]. Modal regression has a wide variety of applications including the analysis of traffic and forest fire data [31, 75], econometrics [45, 50, 51], and machine learning [33, 68]. For example, Kemp and Santos Silva [45] argue that the mode is the most intuitive measure of central tendency for positively skewed data found in many econometric applications such as wages, prices, and expenditures [45, p. 93]. For more recent reviews and further details on the subject the reader is referred to Chen [16] and Chacón [14].

We will start by providing some notation and definitions that are needed for the forthcoming sections. Let \((\mathbf{X}_\mathrm{t},\mathbf{Y}_\mathrm{t})_{\mathrm{t}\ge 0}\) be a \(\mathbb {R}^{d}\times \mathbb {R}^{q}\)-valued strictly stationary and ergodicFootnote 1 continuous time process defined on a probability space \((\Omega , \mathcal {F},\mathbb {P)}\). Let \(g(\cdot ,\cdot )\) be the density function of the random vector \((\mathbf{X}_\mathrm{t},\mathbf{Y}_\mathrm{t})\), \(f(\cdot )\) be the density of \(\mathbf{X}_\mathrm{t}\) and \(\rho (\cdot )\) the density of \(\mathbf{Y}\). For a given measurable function \(\psi (\cdot )\) and \(\mathbf{x}\in \mathbb {R}^{d}\) the regression function, whenever it exists, is defined to be

$$\begin{aligned} m(\mathbf{x},\psi )=\mathbb {E}(\psi (\mathbf{Y})\mid \mathbf{X}=\mathbf{x}). \end{aligned}$$

In this situation, we have the random design regression model and \(\mathbf{X}\) is called the design variable and \(\mathbf{Y}\) the response variable. The random design model is very important in clinical studies, where the design variable usually represents the age of a particular individual receiving treatment, and \(\mathbf{Y}\) is the quantity whose dependence on the age of the patient is investigated. A typical example (from forensic medicine) is given by Härdle and Marron [39], where \(\mathbf{Y}\) stands for the liver weight of female persons (depending on their age). Inequalities \(\mathbf{x} \le \mathbf{y}\) holds for all the components, i.e., \(x_{j}\le y_{j}\) for all \(j = 1,\ldots ,d\). The introduction of the function \(\psi (\cdot )\) allows us to include some important special cases:

  • \(\psi (\mathbf{Y}) = \mathbb {1}\{\mathbf{Y}\le \mathbf{y}\}\) gives the conditional distribution of \(\mathbf{Y}\) given \(\mathbf{X}=\mathbf{x}\).

  • \(\psi (\mathbf{Y}) =\mathbf{Y}^{k}\) gives the conditional moments of \(\mathbf{Y}\) given \(\mathbf{X}=\mathbf{x}\).

In the present paper, we focus on estimating the location \({\varvec{\Theta }}\) and the size \(m({\varvec{\Theta }},\psi )\) of a unique maximum (mode, peak) of the (unknown) function \(m(\cdot ,\psi )\). Our method is indirect in the sense that the estimators of \({\varvec{\Theta }}\) and \(m({\varvec{\Theta }},\psi )\) are based on a kernel estimator \(\widehat{m}_\mathrm{T}(\mathbf{x},\psi )\) of the regression curve \(m(\mathbf{x},\psi )\). We will use the Nadaraya–Watson estimator which is defined by

$$\begin{aligned} \widehat{m}_\mathrm{T}(\mathbf{x},\psi ):=\left\{ \begin{array}{lcr} \frac{\displaystyle \frac{ 1}{Th_\mathrm{T}^d}\int _0^T\psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt}{\displaystyle \frac{ 1}{Th_\mathrm{T}^d}\int _0^T K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt},&{}\text{ if }&{}\displaystyle \frac{ 1}{Th_\mathrm{T}^d}\int _0^T K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt\ne 0,\\ \displaystyle \frac{1}{T}\int _0^T\psi (\mathbf{Y}_\mathrm{t})dt, &{}\text{ if }&{}\displaystyle \frac{ 1}{Th_\mathrm{T}^d}\int _0^T K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt= 0, \end{array}\right. \end{aligned}$$

where \(K(\cdot )\) is a kernel, \(h_\mathrm{T}\) is a positive sequence of real numbers such that

$$\begin{aligned} (i)\ \underset{T \rightarrow \infty }{\lim } h_\mathrm{T} = 0, \quad (ii)\ \underset{T \rightarrow \infty }{\lim } Th_\mathrm{T}^{d}= +\infty , \quad \text{ or }\quad (iii)\ \underset{T \rightarrow \infty }{\lim } \frac{\displaystyle Th_\mathrm{T}^{d}}{\displaystyle \log T}= + \infty . \end{aligned}$$
(1.1)

The condition (i) is used to obtain the asymptotic unbiasedness of the kernel (density or regression) type estimators. We need more restrictive assumption on \(h_{T}\) for the consistency, this is given by the condition (ii), one can refer to Parzen [61]. In general, the strong consistency fails to hold when either (i) or (iii) is not satisfied. Now the location \({\varvec{\Theta }}\) (mode) and the size \(m({\varvec{\Theta }},\psi )\) are estimated by the respective functionals \(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}\) and \(\widehat{m}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )\) pertaining to \(\widehat{m}_\mathrm{T}(\cdot ,\psi )\), i.e., \(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}\) is chosen through the equation

$$\begin{aligned} \widehat{m}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}, \psi ) = \sup _{\mathbf{x}\in \mathfrak {C}} \widehat{m}_\mathrm{T}(\mathbf{x},\psi ), \end{aligned}$$
(1.2)

where the maximum is running over some compact set \(\mathfrak {C}\subset \mathbb {R}^{d}\). Note that \(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}\) exists if \(K(\cdot )\) is continuous; however, it may not be unique. In fact, it is known that kernel estimators tend to produce some additional and superfluous modality. In this context, one can consider

$$\begin{aligned} \widehat{{{\varvec{\Theta }} }}_\mathrm{T}=\inf \left\{ \mathbf{t}\in \mathfrak {C} ~~\text{ such } \text{ that }~~ \widehat{m}_\mathrm{T}(\mathbf{t},\psi )=\sup _{\mathbf{x}\in \mathfrak {C}}\widehat{m}_\mathrm{T}(\mathbf{x},\psi )\right\} , \end{aligned}$$

where the infimum is taken with respect to the lexicographic order on \(\mathbb {R}^{d}\). However, this has no bearing on asymptotic theory; our results are valid for any choice of \(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}\) satisfying (1.2). To ensure both uniqueness and measurability of \(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}\), one could use the so-called mode functional on \(C(\mathfrak {C})\) apparently introduced by Eddy [28] which considers the infimum of the maximised locations and whose measurability is also proved in a paper of Eddy [28]. Alternatively, Grund and Hall [35] have suggested to break ties at random if necessary. Anyway, the validity of our proofs will not be affected by potential non-measurability of \(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}\), since we can always replace probabilities by outer probabilities when necessary with no further changes in the proofs, this issue is discussed in Ziegler [79], and also Ziegler [77, 78], Herrmann and Ziegler [41]. As it is mentioned in Ziegler [79], estimating the mode and size of a maximum of a nonparametric curve by the corresponding functionals of a kernel estimator of the curve is not new; it stems from the closely related problem of estimating the mode of a density. In continuation of Parzen [61] pioneering work on density estimation and estimation of the mode, Eddy [28, 29] and Romano [66] tackled optimality questions of the kernel density estimators of the mode. Romano [66] seems also to be the first to consider data-dependent bandwidths in this framework. In another paper, Romano [65] examined the limiting behaviour of bootstrap estimators of the location of the mode, an idea used later by Grund and Hall [35] in the context of bandwidth selection by minimising the bootstrapped \(L_{p}\)-error for the mode estimator. It is worth noticing that the conditional mode function estimate of the predictor is used for the first time by Collomb et al. [18]. The kernel type estimators were studied extensively in different setting of dependencies, we cite among many others Samanta and Thavaneswaran [67], Ould-Saïd [59], Quintela-Del-Río and Vieu [64], Berlinet et al. [5], Ferraty et al. [34], Ezzahrioui and Ould-Saïd [32], Benrabah et al. [3] and the references therein. Quintela-Del-Río and Vieu [64] motivated the use of the conditional mode by pointing out that the prediction of \(\mathbf{Y}\)-values given the \(\mathbf{X}\)-values is achieved through the regression function estimation. Finally, when the process is considered to be i.i.d., the almost sure convergence along with the mean convergence of the conditional density were obtained by Youndjé [76]. Ota et al. [60] proposed a new estimator of the conditional mode that is able to avoid the curse of dimensionality and at the same time is computationally scalable, thereby complementing the above existing methods.

Within the framework described above, our aim is to establish consistency and asymptotic normality results (which in turn can be exploited for the construction of confidence intervals) for the estimators \(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}\) and \(\widehat{m}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )\) for the location and the size of the peak under some mild local smoothness conditions on the regression function \(m(\cdot ,\psi )\) and the design density \(f(\cdot )\) (mostly imposed locally in a neighbourhood of \({{\varvec{\Theta }} }\)). Those results will be valid for a wide class of kernels not necessarily having compact support. This includes, in particular, the Gaussian kernel which is widely used in practice. Mixing is some kind of asymptotic independence assumption which is commonly used for seek of simplicity but which can be unrealistic in situations where there is strong dependence between the data. Extending non-parametric functional ideas to general dependence structure is a rather underdeveloped field. Note that the ergodic framework avoids the widely used strong mixing condition and its variants to measure the dependency and the involved probabilistic calculations that it implies (see, for instance, Masry [55]). It is worth noticing that the ergodicity is implied by all mixing conditions, being weaker than all of them. Further motivations to consider ergodic data are discussed in Laib and Louani [48, 49], Didi and Louani [27], Bouzebda et al. [12], Bouzebda et al. [8], Bouzebda and Didi [9,10,11] and Krebs [46], in some of these references the definitions of the ergodic property of continuous time processes are given. In the present work, we do not assume anything beyond ergodicity of the underlying process. It is worth noticing that strong mixing implies ergodicity; see e.g., Remark 2.6 on page 50 in combination with Proposition 2.8 on page 51 in Bradley [13]. Hence the present work extends the scope of applications compared to the existing works. On the other hand, we mention that there exist interesting processes which are ergodic but not mixing according to Andrews [1] and Bradley [13]. An example of an ergodic and non-mixing process was considered in Sect. 5.3 of Leucht and Neumann [52]. Indeed, assume that the process \(\{(T_{i},\lambda _{i}):i\in \mathbb {Z}\}\) is strictly stationary with \(T_{i}\mid \mathcal {T}_{i-1}\sim \text{ Poisson }(\lambda _{i})\), let \(\mathcal {T}_i\) be the \(\sigma \)-field generated by \((T_{i}, \lambda _{i},T_{i-1},\lambda _{i-1},\ldots )\). We assume that \(\lambda _{i}=\kappa (\lambda _{i-1}, T_{i-1})\), where \(\kappa :[0,\infty )\times \mathbb {N}\rightarrow (0,\infty )\). However, this process is not mixing in general; see Remark 3 of Neumann [58] for a counterexample. We refer to Leucht and Neumann [52] for further details and motivations for the use of the ergodicity assumption. One of their arguments, is that for certain classes of processes, it can be much easier to prove ergodicity rather than mixing assumption. It is known that any sequence \(\{\varepsilon _\mathrm{t}:t\in \mathbb {Z}\}\) of i.i.d. random variables is ergodic. Hence, it is immediately clear that \(\{\mathbf{Y}_\mathrm{t} :t \in \mathbb {Z}\}\) with

$$\begin{aligned} \mathbf{Y}_\mathrm{t} = \vartheta ((\ldots , \varepsilon _{\mathrm{t}-1}, \varepsilon _\mathrm{t} ), (\varepsilon _{\mathrm{t}+1},\varepsilon _{\mathrm{t}+2},\ldots )) \end{aligned}$$

is also ergodic. Didi [25] has constructed an example of a non-mixing ergodic continuous time process. It is well known that the fractional Brownian motion \(\{W_\mathrm{t}^H:t\ge 0\}\) with parameter \(H\in (0,1)\) has strictly stationary increments. Otherwise, the fractional Gaussian noise, defined for every \(s>0\) by

$$\begin{aligned} \{G_\mathrm{t}^H:t\ge 0\}:=\{W_{\mathrm{t}+s}^H-W_\mathrm{t}^H :t\ge 0\}, \end{aligned}$$

is a strictly stationary centered long memory process when \(H\in (\frac{1}{2},1)\) (see for instance, Beran [4, p.55] and Lu [53, p.17]), hence the condition of strong mixing is not satisfied. Let \(\{G_\mathrm{t}:t\ge 0\}\) be a strictly stationary centered Gaussian process with correlation function

$$\begin{aligned} R(t)=\mathbb {E}[G_0G_\mathrm{t}]. \end{aligned}$$

Relaying on the work of Maslowski and Pospíšil [54], Lemma 4.2, it follows that the process \(\{G_\mathrm{t}:t\ge 0\}\) is ergodic whenever

$$\begin{aligned} \lim _{\mathrm{t}\rightarrow \infty } R(t)=0, \end{aligned}$$

which is the case for the process \(\{G_\mathrm{t}^H: t\ge 0\}\). The ergodicity hypothesis seems to be the most naturally adapted and provides a better framework to study data series, for example, generated by noisy chaos.

To the best of our knowledge, the results presented here, respond to a problem that has not been studied systematically until recently, and it gives the main motivation to this paper. Indeed, we establish the exact rate of strong uniform consistency of the estimators \(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}\) and we characterise the limiting law. To prove our results, we base our methodology upon the martingale approximation which allows to provide a unified nonparametric time series analysis framework enabling one to study systematically dependent data. This methodology is a quite different approach, in the i.i.d. context, compared to the existing ones.

The layout of the article is as follows. The assumptions and asymptotic properties of the estimators are given in Sect. 2, which includes the optimal convergence rates and the asymptotic normality of the estimators \(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}\). Some concluding remarks and possible future developments are mentioned in Sect. 3. To avoid interrupting the flow of the presentation, all mathematical proofs are presented in Sect. 4.

2 Main results

Let us introduce some notation and definitions. Let \(\alpha = (\alpha _{1},\ldots ,\alpha _{d})\) be a multi-index of the nonnegative integers \(\alpha _{ i}\), set \(|\alpha |=\sum _{ i=1}^{d}\alpha _{ i}\), and let

$$\begin{aligned} D^{\alpha }=\frac{\displaystyle \partial |\alpha |}{\displaystyle (\partial x_{1})^{\alpha _{1}}\cdots (\partial x_{d})^{\alpha _{d}}} \end{aligned}$$

denote the partial differential operator of order \(\alpha \). For \(\alpha =0\) set \(D^{\alpha }=id\), for identity. For continuous real-valued functions \(\zeta _{1}(\cdot )\) and \(\zeta _{2}(\cdot )\) that are s-times continuously differentiable on \(\mathbb {R}^{d}\),

$$\begin{aligned} D^{\alpha }(\zeta _{1}\zeta _{2})=\sum _{\{\beta \,:\,\beta \le \alpha \}}\frac{\alpha !}{(\alpha -\beta )! \beta !}\left( {\begin{array}{c}\alpha \\ \beta \end{array}}\right) (D^{\beta }\zeta _{2})(D^{\alpha -\beta }\zeta _{2}). \end{aligned}$$

We will use the notation

$$\begin{aligned} D^{i}\zeta _{1}=\zeta _{1}^{(i)}~ \text{ for } ~ i=1,\ldots ,s. \end{aligned}$$

Let us define the partial derivatives of order one of the regression estimator by

$$\begin{aligned} \widehat{m}_\mathrm{T}^{(1)}(\mathbf{x},\psi )= & {} \left( \frac{\displaystyle M_\mathrm{T}(\mathbf{x},\psi )}{\displaystyle f_\mathrm{T}(\mathbf{x})}\right) ^{(1)}\\= & {} \frac{\displaystyle M_\mathrm{T}^{(1)}(\mathbf{x},\psi )f_\mathrm{T}(\mathbf{x})-f_\mathrm{T}^{(1)}(\mathbf{x})M_\mathrm{T}(\mathbf{x},\psi )}{\displaystyle f_\mathrm{T}^2(\mathbf{x})}. \end{aligned}$$

The derivatives of order \(\alpha =1,2\) of the estimators \(f_\mathrm{T}(\mathbf{x})\) and \(M_\mathrm{T}(\mathbf{x},\psi )\) are defined as follows

$$\begin{aligned} f_\mathrm{T}^{(\alpha )}(\mathbf{x})= \frac{1}{Th_\mathrm{T}^{d+\alpha }}\int _0^T K^{(\alpha )}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt, \end{aligned}$$

and

$$\begin{aligned} M_\mathrm{T}^{(\alpha )}(\mathbf{x},\psi )= \frac{1}{Th_\mathrm{T}^{d+\alpha }}\int _0^T \psi (\mathbf{Y}_\mathrm{t}) K^{(\alpha )}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt. \end{aligned}$$

We denote by \(m^{(1)}(\cdot ,\psi )\) the gradient of the function \(m(\cdot ,\psi )\) : \(\mathbb {R}^{d} \rightarrow \mathbb { R}\), that is, \(m^{(1)}(\cdot ,\psi )\) is the \(d \times 1\)-vector of the partial derivatives of \(m(\cdot ,\psi )\)

$$\begin{aligned} m^{(1)}(\cdot ,\psi )=\left( \frac{\partial }{\partial x_{1}}m(\cdot ,\psi ),\ldots ,\frac{\partial }{\partial x_{d}}m(\cdot ,\psi )\right) ^{\top }. \end{aligned}$$

Using the definition of the conditional mode function, i.e. the mode of \(m(\cdot ,\psi )\), we have

$$\begin{aligned} m^{(1)}({\varvec{\Theta }},\psi ) =\left( \frac{\partial }{\partial x_{1}}m({\varvec{\Theta }},\psi ),\ldots ,\frac{\partial }{\partial x_{d}}m({\varvec{\Theta }},\psi )\right) ^{\top }= 0. \end{aligned}$$
(2.1)

Similarly, it follows from the statement (2.1) that

$$\begin{aligned} \widehat{m}^{(1)}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi ) =\left( \frac{\partial }{\partial x_{1}}\widehat{m}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi ),\ldots ,\frac{\partial }{\partial x_{d}}\widehat{m}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )\right) ^{\top }= 0. \end{aligned}$$

We denote by \(m^{(2)}(\cdot ,\psi )\) the Hessian of the function \(m(\cdot ,\psi )\), that is, \(m^{(2)}(\cdot ,\psi )\) is the \(d \times d\)-matrix of the second partial derivatives of \(m(\cdot ,\psi )\) . Furthermore, assumption (A.7) implies that

$$\begin{aligned} m^{(2)}({\varvec{\Theta }},\psi )<0, \quad \text{ and } \quad \widehat{m}_\mathrm{T}^{(2)}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi ) <0. \end{aligned}$$

By the definition of \(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}\), we have \(\widehat{m}^{(1)}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi ) = 0\) so that

$$\begin{aligned} \widehat{m}^{(1)}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )-\widehat{m}^{(1)}_\mathrm{T}({{\varvec{\Theta }}},\psi )=-\widehat{m}^{(1)}_\mathrm{T}({{\varvec{\Theta }}},\psi ). \end{aligned}$$
(2.2)

For each \(i \in \{1,\ldots , d\}\), Taylor’s expansion applied to the real-valued application \(\frac{\partial }{\partial _{x_{i}}} \widehat{m}^{(1)}_\mathrm{T}(\cdot ,\psi )\) implies the existence of \({\varvec{\Theta }}_\mathrm{T}^\star (i)=(\Theta _{\mathrm{T},1}^\star (i),\ldots ,\Theta _{\mathrm{T},d}^\star (i))^{\top }\)

$$\begin{aligned} \left\{ \begin{array}{l} \displaystyle \frac{\partial }{\partial x_{i}}\widehat{m}_\mathrm{T}(\widehat{{\varvec{\Theta }}}_\mathrm{T},\psi ) - \displaystyle \frac{\partial }{\partial x_{i}}\widehat{m}_\mathrm{T}({{\varvec{\Theta }}},\psi ) =\sum _{j=1}^{d}\displaystyle \frac{\partial ^2}{\partial x_{i}\partial x_{j}}\widehat{m}_\mathrm{T}({\varvec{\Theta }}_\mathrm{T}^\star (i),\psi )(\widehat{\Theta }_{\mathrm{T},j}-\Theta _{j}), \\ \left| \Theta _{\mathrm{T},1}^\star (i)-\Theta _{j}\right| \le |\widehat{\Theta }_{\mathrm{T},j}-\Theta _{j}|, ~~j\in \{1,\ldots ,d\}. \end{array}\right. \end{aligned}$$
(2.3)

Define the \(d \times d\) matrix \(H_\mathrm{T} = (H_{\mathrm{T},i,j} )1\le i,j\le d\) by setting

$$\begin{aligned} H_{\mathrm{T},i,j}= \frac{\partial ^2}{\partial x_{i}\partial x_{i}}\widehat{m}_\mathrm{T}({\varvec{\Theta }}_\mathrm{T}^\star (i),\psi ). \end{aligned}$$

Equation (2.2) can then be rewritten as

$$\begin{aligned} H_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}-{{\varvec{\Theta }}})=-\widehat{m}^{(1)}_\mathrm{T}({{\varvec{\Theta }}},\psi ). \end{aligned}$$
(2.4)

The last relation will play an important role in our proofs, in particular for the asymptotic normality. To formulate our assumptions, some additional notation is required, for some constant \(\delta >0\) small enough, let \(n\in \mathbb {N}\) be such that \(T=\delta n\), and \(T_j=j\delta \), for \(j=1,\ldots ,n\). Let \(\mathcal {F}_\mathrm{t}\) be the \(\sigma -\)field defined by

$$\begin{aligned} \mathcal {F}_\mathrm{t}:=\sigma \{(\mathbf{X}_s,\mathbf{Y}_s): 0\le s< t \}. \end{aligned}$$

Set \(\mathcal {F}_{j}\) to be the \(\sigma \)-field defined by

$$\begin{aligned} \mathcal {F}_{j}:=\sigma \{(\mathbf{X}_s,\mathbf{Y}_s): 0\le s\le T_j\}. \end{aligned}$$

Let \( \mathcal {S}_{\mathrm{t},\delta }\) be the \(\sigma \)-field defined by

$$\begin{aligned} \mathcal {S}_{\mathrm{t},\delta }:= \sigma \{(\mathbf{X}_s,\mathbf{Y}_s),(\mathbf{X}_r): 0\le s< t; t\le r \le t+\delta \}. \end{aligned}$$

Let \(\mathcal {G}_t:=\sigma \{(\mathbf{X}_s,\mathbf{Y}_s): 0\le s\le t\}\), and for \(\delta >0\) small enough, let \(g^{\mathcal {G}_{t-\delta }}(\cdot )\) and \(\rho ^{\mathcal {G}_{t-\delta }}(\cdot )\) be the conditional densities of \((\mathbf{X},\mathbf{Y})\) and \(\mathbf{Y}\) respectively, given the \(\sigma -\)field \(\mathcal {G}_{t-\delta }\). Finally, if \(\zeta (\cdot )\) is a real-valued random function which satisfies \(\zeta (u) / u \rightarrow 0\) a.s. as \(u \rightarrow 0,\) we write \(\zeta (u)=o_{\text{ a.s. } }(u)\). In the same way, we say that \(\zeta (u)\) is \(O_{\text{ a.s. } }(u)\) if \(\zeta (u) / u\) is a.s. bounded as \(u \rightarrow 0 .\)

2.1 Assumptions

In our analysis, the following assumptions are needed.

  1. (A.1)

    The kernel \(K(\cdot )\), is a probability density function compactly supported,

    1. (i)

      Kernel K is assumed to be Lipschitz with ratio \(C_K<\infty \) and order \(\gamma \), i.e.,

      $$\begin{aligned} |K(\mathbf{x})-K(\mathbf{x}^{'}) | \le C_K\Vert \mathbf{x}-\mathbf{x}^{'}\Vert ^{\gamma }, \quad (\mathbf{x},\mathbf{x}^{'})\in \mathbb {R}^{2d}; \end{aligned}$$
    2. (ii)

      \(\int _{\mathbb {R}^d} \Vert \mathbf{x}\Vert K(\mathbf{x}) d\mathbf{x} <\infty ;\)

  2. (A.2)

    There exists \(\Gamma < \infty \), such that, for all \(\mathbf{x} \in \mathfrak {C}\),

    $$\begin{aligned} \underset{\mathbf{x} \in \mathfrak {C}}{\sup } |m(\mathbf{x},\psi )|< \Gamma ; \end{aligned}$$
  3. (A.3)
    1. (i)

      Recall that \(\mathfrak {C}\) is a compact set of \(\mathbb {R}^d\). Assume, for all \(\mathbf{x} \in \mathfrak {C}\), that there exists \(\lambda >0\) and finite constant \(0<\eta \) such that

      $$\begin{aligned} \lambda \le f(\mathbf{x}) \le \eta ; \end{aligned}$$
    2. (ii)

      the density \(f(\cdot )\) is an element of \(\mathcal {C}^2(\mathbb {R}^{d})\);

  4. (A.4)

    For every \(t\in \mathbb {R}_+\), for every \(\mathbf{x}\in \mathbb {R}^{d},\)

    1. (i)

      The conditional density \(f_{\mathbf{X}_\mathrm{t}}^{\mathcal {F}_{\mathrm{t}- \delta }}(\cdot )\) of \(\mathbf{X}_\mathrm{t}\) given the \(\sigma \)-field \(\mathcal {F}_{\mathrm{t}- \delta }\) exists a.s. and is an element of \(\mathcal {C}^2(\mathbb {R}^{d})\);

    2. (ii)

      For any \(\delta >0\) small enough

      $$\begin{aligned} \underset{T\rightarrow \infty }{\lim } \frac{1}{T} \int _0^T f_{\mathbf{X}_\mathrm{t}}^{\mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x}) dt = f(\mathbf{x}) , \quad \text{ a.s. }; \end{aligned}$$
  5. (A.5)

    For every \(t\in \mathbb {R}_+\), for every \(\mathbf{x}\in \mathbb {R}^{d},\)

    $$\begin{aligned} \underset{T\rightarrow \infty }{\lim } \underset{\mathbf{x} \in \mathbb {R}^d}{\sup } \left| \frac{1}{T} \int _0^T f_{\mathbf{X}_\mathrm{t}}^{\mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x}) dt - f(\mathbf{x}) \right| = 0, \quad \text{ a.s. }, \end{aligned}$$

    for any \(\delta >0\) small enough;

  6. (A.6)

    For any t and r such that \(t\in [0,T]\) and \(t\le r\le t+\delta \) we have

    1. (i)
      $$\begin{aligned} \mathbb {E}(|\psi (\mathbf{Y}_r)|\vert \mathcal {S}_{\mathrm{t},\delta })=\mathbb {E}(|\psi (\mathbf{Y}_r)|\vert \mathbf{X}_r)=m(\mathbf{X}_r); \end{aligned}$$
    2. (ii)

      there exist constants \(C_\psi >0\) and \(\beta >0\) such that, for any couple \((\mathbf{x},\mathbf{x}^\prime )\in \mathbb {R}^{2d}\),

      $$\begin{aligned} \left| m(\mathbf{x},\psi )-m(\mathbf{x}^\prime ,\psi )\right| \le C_\psi \left\| \mathbf{x}-\mathbf{x}^\prime \right\| ^\beta ; \end{aligned}$$
    3. (iii)

      For any \(k\ge 2\) and any \(\delta >0\),

      $$\begin{aligned} \mathbb {E}\left( \left| \psi ^k(\mathbf{Y}_r)\right| \vert \mathcal {S}_{\mathrm{t},\delta }\right) =\mathbb {E}\left( \left| \psi ^k(\mathbf{Y}_r)\right| \vert \mathbf{X}_r\right) , \end{aligned}$$

      and the function \(\Phi _k(\mathbf{x},\psi )=\mathbb {E}\left( \left| \psi ^k(\mathbf{Y})\right| \vert \mathbf{X}=\mathbf{x}\right) \) is continuous in the neighbourhood of \(\mathbf{x}\);

  7. (A.7)

    For any fixed \(\mathbf{x} \in \mathbb {R}^d\),

    1. (i)

      \(m(\mathbf{x},\psi )\) is twice differentiable on \(\mathbb {R}^{d}\), the matrix \(m^{(2)}(\mathbf{x},\psi )\) is continuous in a neighbourhood of \({\varvec{\Theta }}\), and \(m^{(2)}({\varvec{\Theta }},\psi )\) is nonsingular;

    2. (ii)

      \(m^{(2)}({\varvec{\Theta }},\psi )\) is bounded on \(\mathbb {R}^{d}\).

2.2 Comments on hypotheses

Conditions (A.1) are very common in the nonparametric function estimation literature. Notice that the condition (A.1) is classical in the nonparametric estimation procedures. In particular, by imposing the condition (A.1)(i), the kernel function exploits the smoothness of the density function or the regression function. If we loose the condition that the kernel function \(K(\cdot )\) must be a density, the convergence rate could be faster. Indeed, the convergence rate can be made arbitrarily close to the parametric \(n^{-1}\) as the order increases. In fact, Chacón et al. [15] showed that the parametric rate \(n^{-1}\) can be attained by the use of super-kernels, and that super-kernel density estimators automatically adapt to the unknown degree of smoothness of the density. The main drawback of higher-order kernels in this situation is the negative impact of the kernel may make the estimated density not a density itself. The interested reader is referred to, e.g., Jones et al. [44], Jones and Signorini [43] and Jones [42]. They set some kind of regularity upon the kernels used in our estimates. Notice that the transform of the stationary ergodic process \((\mathbf{X}_\mathrm{t},\mathbf{Y}_\mathrm{t})_{\mathrm{t}\ge 0}\) into the process \((\psi ^2(\mathbf{Y}_\mathrm{t}))_{\mathrm{t}\ge 0}\) is a measurable function. Therefore, making use of Proposition 4.3 of Krengel [47] and then the ergodic Theorem, we obtain

$$\begin{aligned} \underset{T\rightarrow \infty }{\lim } \frac{1}{T} \int _0^T \psi ^2(\mathbf{Y}_\mathrm{t}) dt = \mathbb {E}\left[ \psi ^2(\mathbf{Y}_0)\right] \quad \text{ a.s., } \end{aligned}$$
(2.5)

Condition (A.3)(ii), is a technical condition that simplifies our proofs, precisely, we assume that the density function \(f(\cdot )\) is bounded away from zero and infinity on the compact set \(\mathfrak C\) in a similar way as in Ziegler [79], Stute [70], Harel and Puri [40], Debbarh [20]. For any set \(B\subset \mathbb {R}^d\) and \(\epsilon >0\), denote by \(B^\epsilon \) the set of all \(\mathbf{x}\in \mathbb {R}^d\) such that there exists \(\mathbf{y}\in B\) with \(\Vert \mathbf{x}-\mathbf{y}\Vert <\epsilon \). One can use that \(f(\cdot )\) is continuous and strictly positive on \(\mathfrak C^\epsilon \), but this will add much extra complexity to the proofs. Condition (A.4) involves the ergodic nature of the data as given, for instance, in Györfi et al. [36]. Assuming that \(\rho ^{\mathcal G_{\mathrm{t}-\delta }}(\cdot )\) and \(g^{\mathcal G_{\mathrm{t}-\delta }}(\cdot )\) belong to the space \(\mathfrak {C}^0\), at least, of continuous functions, which is a separable Banach space. Moreover, approximating the integral \(\displaystyle {\int _0^T \rho ^{\mathcal G_{\mathrm{t}-\delta }}(\mathbf{y}) dt}\) and \(\displaystyle {\int _0^T g^{\mathcal G_{\mathrm{t}-\delta }}(\mathbf{x},\mathbf{y}) dt}\) by their Riemann’s sums, it follows that

$$\begin{aligned} \displaystyle { T^{-1}\int _0^T \rho ^{\mathcal G_{\mathrm{t}-\delta }}(\mathbf{y}) dt}&\backsimeq&\displaystyle { n^{-1}\sum _{i=1}^{n} \rho ^{\mathcal G_{\mathrm{t}_i-\delta }}(\mathbf{y}) } \\= & {} \displaystyle {n^{-1} \sum _{j=1}^{n} \rho ^{\mathcal G_{(j-1)\delta }}(\mathbf{y}) }, \end{aligned}$$

and

$$\begin{aligned} \displaystyle { T^{-1}\int _0^T g^{\mathcal G_{\mathrm{t}-\delta }}(\mathbf{x},\mathbf{y}) dt}&\backsimeq&\displaystyle { n^{-1}\sum _{i=1}^{n} g^{\mathcal G_{\mathrm{t}_i-\delta }}(\mathbf{x},\mathbf{y}) } \\= & {} \displaystyle {n^{-1} \sum _{j=1}^{n} g^{\mathcal G_{(j-1)\delta }}(\mathbf{x},\mathbf{y}) }. \end{aligned}$$

Since the processes \((\mathbf{X}_{\mathrm{T}_j}, \mathbf{Y}_{\mathrm{T}_j})_{j\ge 1}\) and \((\mathbf{Y}_{\mathrm{T}_j})_{j\ge 1}\) are stationary and ergodic (see Proposition 4.3 of Krengel [47]) following Delecroix [21] (see, Lemma 4 and Corollary 1 along with with their proofs), one may prove that the sequences \((\rho ^{\mathcal G_{(j-1)\delta }}(\mathbf{y}))_{j\ge 1}\) and \((g^{\mathcal G_{(j-1)\delta }}(\mathbf{x},\mathbf{y}))_{j\ge 1}\) of conditional densities are stationary and ergodic. Moreover, making use of Beck [2]’s theorem (see, for instance, Györfi et al. [36], Theorem 2.1.1), it follows that

$$\begin{aligned}&\lim _{\mathrm{T}\rightarrow \infty }\sup _{y\in \mathbb {R}}\left| \frac{1}{T} \int _0^T \rho ^{\mathcal G_{\mathrm{t}-\delta }}(\mathbf{y}) dt-\mathbb {E}(\rho ^{\mathcal G_{-\delta }}(\mathbf{y}))\right| \\&\quad =\lim _{\mathrm{T}\rightarrow \infty }\sup _{y\in \mathbb {R}}\left| \frac{1}{T} \int _0^T \rho ^{\mathcal G_{\mathrm{t}-\delta }}(\mathbf{y})dt-\rho (\mathbf{y})\right| = 0, a.s. \end{aligned}$$

and

$$\begin{aligned}&\lim _{\mathrm{T}\rightarrow \infty }\sup _{\mathbf{x}\in \mathbb {R}^d}\left| \frac{1}{T} \int _0^T g^{\mathcal G_{\mathrm{t}-\delta }}(\mathbf{x},\mathbf{y}) dt-\mathbb {E}(g^{\mathcal G_{-\delta }}(\mathbf{x},\mathbf{y}))\right| \\&\quad =\lim _{\mathrm{T}\rightarrow \infty }\sup _{\mathbf{x}\in \mathbb {R}^d}\left| \frac{1}{T} \int _0^T g^{\mathcal G_{\mathrm{t}-\delta }}(\mathbf{x},\mathbf{y})dt-g(\mathbf{x},\mathbf{y})\right| = 0, a.s. \end{aligned}$$

It is then clear that both the conditions (A.4) and (A.5) are satisfied. Condition (A.6)(i) is usual in the literature dealing with the study of ergodic processes. The hypothesis (A.6)(ii) is a regularity condition upon the regression function. For the condition (A.6)(iii), we can refer to the following examples.

Example 2.1

Consider the regression model \(Y_t= m(X_t) + \epsilon _t,\) where the random variables \(\epsilon _t\)’s stand as martingale differences with respect to the \(\sigma \)-field \(\mathcal {S}_{r,\delta }, r\le t\le r+\delta \), generated by \(\big \{(X_s,\epsilon _s), (X_t) : 0\le s< r, r\le t\le r + \delta \big \}\). Clearly, we have

$$\begin{aligned} \mathbb E[Y_t|\mathcal {S}_{r,\delta }]= m(X_t), \end{aligned}$$

almost surely.

Example 2.2

Consider the regression model \(Y_t= m(X_t) + \sigma (X_t)\epsilon _t\), where the random variables \(\epsilon _t\) are centered and independent of the process \((X_t)_{t\ge 0}\). Taking \(\mathcal {S}_{r,\delta }\) as the \(\sigma \)-field generated by \(\big \{(X_s): 0\le s\le r\big \}\), it follows, for \(t\le r\), that

$$\begin{aligned} \mathbb E[Y_t|\mathcal {S}_{r,\delta }]= \mathbb E[m(X_t)+\sigma (X_t)\epsilon _t|\mathcal {S}_{r,\delta }]=m(X_t)+\sigma (X_t)\mathbb E[\epsilon _t]=m(X_t), \end{aligned}$$

almost surely.

Remark 2.3

For notational convenience, we have chosen the same bandwidth sequence for all margins. This assumption can be dropped easily. If one wants to make use of the vector bandwidths (see, in particular, Chapter 12 of Devroye and Lugosi [24]). With obvious changes in the notation, our results and their proofs remain true when \(h_\mathrm{T}\) is replaced by a vector bandwidth \(\mathbf{h}_\mathrm{T} = (h^{(1)}_\mathrm{T}, \ldots , h^{(1)}_\mathrm{T})\), where \(\min h^{(i)}_\mathrm{T} > 0\). In this situation we set \(h_\mathrm{T}=\prod _{i=1}^{d} h_\mathrm{T}^{(i)}\), and for any vector \(\mathbf{v} = (v_{1} ,\ldots ,v_{d})\) we replace \(\mathbf{v}/h\) by \((v_{1}/h_\mathrm{T}^{(1)},\ldots ,v_{d}/h_\mathrm{T}^{(d)})\). For a better understanding we will use real-valued bandwidths throughout the text.

2.3 Theoretical properties

Below, we write \(Z {\mathop {=}\limits ^{\mathcal {D}}} \mathcal {N}(\mu , \sigma ^{2} )\) whenever the random variable Z follows a normal law with expectation \(\mu \) and variance matrix \(\sigma ^{2}\), \({{\mathop {\rightarrow }\limits ^{\mathcal {D}}}}\) denotes the convergence in distribution and \({{\mathop {\rightarrow }\limits ^{\mathbb {P}}}}\) the convergence in probability.

2.3.1 Consistency

The following theorem gives the almost sure consistency result.

Theorem 2.4

Under the hypotheses (A.1)–(A.4) and (A.6), for any n large enough, we have

$$\begin{aligned} \Vert \widehat{{{\varvec{\Theta }} }}_\mathrm{T}-{\varvec{\Theta }} \Vert = O(h_\mathrm{T}^\beta )+O\left( \left( \frac{\log {T}}{Th_\mathrm{T}^{d}}\right) ^{1/2}\right) ,\ \text{ a.s. } \end{aligned}$$

The proof of Theorem 2.4 is postponed to the Sect. 4.

2.3.2 Asymptotic normality

To establish the asymptotic normality of \(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}\), observing the statement (2.4), we have to prove that the numerator suitably normalised is asymptotically normally distributed and that the denominator converges in probability to \(m_\mathrm{T}^{(2)}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )\). Let G be the \(d \times d\) matrix defined by, for \(i,j=1,\ldots ,d\),

$$\begin{aligned} G_{i,j}=\int _{\mathbb {R}^{d} }\frac{\partial }{\partial u_{i}}K(\mathbf{u})\frac{\partial }{\partial u_{j}}K(\mathbf{u})d\mathbf{u}. \end{aligned}$$

Let us introduce the matrix \(V({\varvec{\Theta }},\psi )\), the \(d \times d\) matrix defined by, for \(i,j=1,\ldots ,d\)

$$\begin{aligned} V_{i,j}({\varvec{\Theta }},\psi )=\frac{\mathbb {E}(|\psi ^2(\mathbf{Y})|\vert \mathbf{X}={\varvec{\Theta }})}{f({\varvec{\Theta }})}G_{i,j} \end{aligned}$$

The main result to be proved here may now be stated precisely as follows.

Theorem 2.5

  1. 1.

    Under the assumptions (A.1), (A.3)(i)–(ii), (A.4) and (A.6), for any n large enough, we have

    $$\begin{aligned} \sqrt{Th_\mathrm{T}^{d+1}} \widehat{m}_\mathrm{T}^{(1)}({\varvec{\Theta }},\psi )\overset{D}{\rightarrow }N(0,{V({\varvec{\Theta }},\psi )}). \end{aligned}$$
  2. 2.

    If the assumptions (A.1)–(A.5), (A.6)(i) and (A.7) are fulfilled, we have, as \(T\rightarrow \infty \), \(\widehat{m}_\mathrm{T}^{(2)}(\cdot ,\psi )\) converges uniformly to \(m^{(2)}(\cdot ,\psi )\) on the compact set \(\mathfrak {C}\). Then, for any n large enough, we have

    $$\begin{aligned} \sqrt{Th_\mathrm{T}^{d+1}} \left( \widehat{{{\varvec{\Theta }} }}_\mathrm{T}-{\varvec{\Theta }}\right) \overset{D}{\rightarrow }N(0,[m^{(2)}({\varvec{\Theta }},\psi )]^{-1}V({\varvec{\Theta }},\psi )[m^{(2)}({\varvec{\Theta }},\psi )]^{-1}). \end{aligned}$$
    (2.6)

The proof of Theorem 2.5 is postponed to the Sect. 4.

2.4 Confidence set

The asymptotic variance in the central limit theorem depends on the unknown functions, which should be estimated in practice. Let us introduce the matrix \(\widehat{V}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )\) an estimate of \(V({\varvec{\Theta }},\psi )\), that is a \(d \times d\) matrix defined for \(i,j=1,\ldots ,n\), by

$$\begin{aligned} \widehat{V}_{i,j}({\varvec{\Theta }},\psi )=\frac{ \widehat{m}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi ^{2})}{f_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T})}G_{i,j}. \end{aligned}$$

The asymptotic variance is estimated by

$$\begin{aligned}{}[\widehat{m}^{(2)}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )]^{-1}\widehat{V}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )[\widehat{m}^{(2)}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )]^{-1}. \end{aligned}$$

Furthermore, from (2.6), the approximate confidence region of \({\varvec{\Theta }}\) can be obtained as

$$\begin{aligned} {\varvec{\Theta }}\in \left[ \widehat{{{\varvec{\Theta }} }}_\mathrm{T} \pm c_\alpha \frac{\left[ [\widehat{m}^{(2)}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )]^{-1}\widehat{V}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )[\widehat{m}^{(2)}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )]^{-1}\right] ^{1/2}}{\sqrt{Th_T^{d+1}}}\right] , \end{aligned}$$

where \(c_\alpha \), denotes the \((1-\alpha )-\)quantile of the multivariate normal distribution. Note that \(c_\alpha \) is not unique since \({\varvec{\Theta }}\) is assumed to be a vector. Sinotina and Vogel [69] used a different approach to construct confidence sets derived as suitable neighbourhoods for maximum points of a regression estimator. The approach relies on the concentration-of-measure inequalities for the regression estimators.

Remark 2.6

It can be observed that our proofs constitute a generalisation of those used in the kernel density mode. Hence, one can obtain easily the corresponding results for the mode density estimators as a particular case of our setting. More precisely, one can consider the kernel estimator of the conditional density of \(\mathbf{Y}\) given \(\mathbf{X}=\mathbf{x}\), defined by

$$\begin{aligned} \widehat{g}_\mathrm{T}(\mathbf{y}\mid \mathbf{x}):=\begin{array}{lcr} \frac{\displaystyle \frac{ 1}{Th_\mathrm{T}^d}\int _0^T \mathbf{K}\left( \frac{\mathbf{y}-\mathbf{Y}_\mathrm{t}}{\breve{h}_\mathrm{T}}\right) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt}{\displaystyle \frac{ 1}{Th_\mathrm{T}^d}\int _0^T K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt},&\text{ for }&\displaystyle \frac{ 1}{Th_\mathrm{T}^d}\int _0^T K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt\ne 0, \end{array} \end{aligned}$$

where \(\mathbf{K}(\cdot )\) is a kernel, \(\breve{h}_\mathrm{T}\) is a positive sequence of real numbers tending to 0 at a specific rate. We refer to Bouzebda et al. [8] for more details about the framework of functional ergodic discrete time processes.

Remark 2.7

Chen et al. [17] considered that the conditional (or local) mode set at x is defined as

$$\begin{aligned} M(x) = \biggl \{y: \frac{\partial }{\partial y} p(y\mid x)=0, \frac{\partial ^2}{\partial y^2} p(y\mid x)<0 \biggr \}, \end{aligned}$$
(2.7)

where \(p(y\mid x) = p(x,y)/f(x)\) is the conditional density of Y given \(X=x\). As a simplification, the set M(x) can be expressed in terms of the joint density as:

$$\begin{aligned} M(x) = \biggl \{y: \frac{\partial }{\partial y} p(x,y)=0, \frac{\partial ^2}{\partial y^2} p(x,y)<0 \biggr \}. \end{aligned}$$
(2.8)

At each x, the local mode set M(x) may consist of several points, and so M(x) is in general a multivalued function. Under appropriate conditions, as we will show, these modes change smoothly as x changes. Thus, local modes behave like a collection of surfaces called modal manifolds in Chen et al. [17]. In our setting, we have considered the extension of the work of Ziegler [79] to the multivariate ergodic setting. The approaches are different and the extension of Chen et al. [17] to the ergodic setting is of interest. The proof of such a statement, however, should require a different methodology than that used in the present paper, and we leave this problem open for future research.

Remark 2.8

In continuous time, data are often collected by using a sampling scheme. Several discretisation schemes have been proposed throughout the literature including deterministic and randomised sampling. The interested reader is referred to Masry [56], Prakasa Rao [62, 63], Bosq [7] and Blanke and Pumo [6]. To simplify the idea, we consider the density estimator of \(f(\cdot )\) based on \(\{\mathbf{X}_{t}: t\in [0,T]\}\) and let \(\{\mathbf{X}(t_k): k=1,\ldots ,n\}\) be its sampled discrete sequence. The sampled estimator of the density \(f(\cdot )\) is then

$$\begin{aligned} f_{n}(\mathbf{x})=\frac{1}{nh^{d}_{n}}\sum _{i=1}^{n}K\left( \frac{\mathbf{x}-\mathbf{X}_{t_{j}}}{h_{n}}\right) . \end{aligned}$$

As in Masry [56], we only recall two cases of designs: irregular sampling and random sampling.

Deterministic sampling.:

Consider the case where the instants \((t_{k})_{1\le k\le n}\) are deterministic irregularly spaced with

$$\begin{aligned} \inf _{1\le k\le n}|t_{j+1}-t_{j}|=\frac{1}{\tau }, \end{aligned}$$

for some \(\tau >0\). For \(1\le k \le n\), consider \(\mathcal {G}_{k} :=\sigma (X(t_{k}))\) the \(\sigma \)-field generated by \(\{\mathbf{X}_{s} :0\le s\le t_{k}\}\). Obviously, \((\mathcal {G}_{k})_{1\le k \le n}\) in an increasing family of \(\sigma \)-fields.

Random sampling.:

Assume that the instants \((t_{k})_{1\le k\le n}\) form a sequence of uniform random variables in the interval [0, T] independent of the process \(\{\mathbf{X}_{t}: t\in [0,T]\}\). Define

$$\begin{aligned} 0 \le \tau _{1}< \cdots < \tau _{n} \le T \end{aligned}$$

as the associated order statistics. Notice that \((\tau _{k})_{1\le k \le n}\) are the process observation points. Obviously, the spacings between these points are all positive. As a consequence, taking \(\mathcal {G}_{k} :=\sigma (X(t_{k}))\) the \(\sigma \)-field generated by \(\{\mathbf{X}_{s} :0\le s\le \tau _{k}\}\), it follows that \((\mathcal {G}_{k})_{1\le k \le n}\) is a sequence of increasing \(\sigma \)-fields.

We would like here to mention that the penalisation procedure for the choice of the mesh \(\delta \) of the observations gives an optimal rate of convergence as demonstrated in Comte and Merlevède [19], we leave this problem open for future research in the framework of ergodic processes.

3 Concluding remarks

In the present paper, we are mainly concerned with the nonparametric regression model, where the regression function \(m(\cdot , \psi )\) is given by \(m(\mathbf{x},\psi ) = \mathbb {E}(\psi (\mathbf{Y}) \mid \mathbf{X} = \mathbf{x}))\). For a measurable function \(\psi : \mathbb {R}^{q} \rightarrow \mathbb {R}\), estimation of the location \({\varvec{\Theta }}\) (mode) of a unique maximum of \(m(\cdot , \psi )\) by the location \( \widehat{{\varvec{\Theta }}}_\mathrm{T}\) of a maximum of the Nadaraya–Watson kernel estimator \(\widehat{m}_\mathrm{T}(\cdot ,\psi )\) for the curve \(m(\cdot , \psi )\) is considered. Within this context, we obtain consistency and asymptotic normality results for \( \widehat{{\varvec{\Theta }}}_\mathrm{T}\) under mild local smoothness assumptions on \(m(\cdot , \psi )\) and the design density of \(\mathbf{X}\). It is worth noticing that the ergodic framework covers and completes various situations compared to the mixing case and is more convenient to use in practice, in this sense our work extends the already existing research in the literature. We have illustrated how to use our results to construct the confidence set for the mode \({\varvec{\Theta }}\). In a future research one could consider the same estimation problem for stationary and ergodic discrete time processes in the case of censored data. It will be of interest to relax the stationarity to the local stationarity and establish similar results to those presented in this work, which requires a different mathematical methodology than the one used in this document. We leave this problem open for further investigation.

4 Proofs

This section is devoted to the proofs of our results. The previously defined notation continues to be used in what follows.

From the definition of \(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}\) in (1.2) and \({\varvec{\Theta }}\), we have

$$\begin{aligned} |m(\widehat{{{\varvec{\Theta }} }}_\mathrm{T} ,\psi )-m({\varvec{\Theta }},\psi ) |\le & {} |\widehat{m}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi )- m(\widehat{{{\varvec{\Theta }} }}_\mathrm{T},\psi ) | \nonumber \\&+ |\widehat{m}_\mathrm{T}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T} ,\psi )- m({\varvec{\Theta }},\psi )| \nonumber \\\le & {} \underset{\mathbf{x} \in \mathfrak {C}}{\sup }\left| \widehat{m}_\mathrm{T}(\mathbf{x},\psi )- m(\mathbf{x},\psi )\right| \nonumber \\&r+ \left| \underset{\mathbf{x} \in \mathfrak {C}}{\sup }\ \widehat{m}_\mathrm{T}(\mathbf{x},\psi ) - \underset{\mathbf{x} \in \mathfrak {C}}{\sup }\ m(\mathbf{x},\psi )\right| \nonumber \\\le & {} 2\ \underset{\mathbf{x} \in \mathfrak {C}}{\sup }\left| \widehat{m}_\mathrm{T}(\mathbf{x},\psi )- m(\mathbf{x},\psi )\right| . \end{aligned}$$
(4.1)

Consider the following decomposition

$$\begin{aligned} Q_\mathrm{T}(\mathbf{x})&:= ({\Psi }_\mathrm{T}(\mathbf{x},\psi ) - \bar{\Psi }_\mathrm{T}(\mathbf{x},\psi ))-m(\mathbf{x},\psi )(f_\mathrm{T}(\mathbf{x}) -\bar{f}_\mathrm{T}(\mathbf{x})), \end{aligned}$$
(4.2)
$$\begin{aligned} R_\mathrm{T}(\mathbf{x},\psi )&:= -B_\mathrm{T}(\mathbf{x},\psi )(f_\mathrm{T}(\mathbf{x}) -\bar{f}_\mathrm{T}(\mathbf{x})),\end{aligned}$$
(4.3)
$$\begin{aligned} B_\mathrm{T}(\mathbf{x},\psi )&:=\frac{\bar{\Psi }_\mathrm{T}(\mathbf{x},\psi )}{\bar{f}_\mathrm{T}(\mathbf{x})}-m(\mathbf{x},\psi ) ,\end{aligned}$$
(4.4)
$$\begin{aligned} \widehat{m}_\mathrm{T}(\mathbf{x},\psi )-m(\mathbf{x},\psi )&= B_\mathrm{T}(\mathbf{x},\psi ) + \frac{ Q_\mathrm{T}(\mathbf{x},\psi )+R_\mathrm{T}(\mathbf{x},\psi )}{f(\mathbf{x})}\frac{f(\mathbf{x})}{f_\mathrm{T}(\mathbf{x})}, \end{aligned}$$
(4.5)

where

$$\begin{aligned} \bar{f}_\mathrm{T}(\mathbf{x})= & {} \frac{1}{Th_\mathrm{T}^d} \int _0^T \mathbb {E} \left[ K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{\mathrm{t}-\delta } \right] dt,\\ \Psi _\mathrm{T}(\mathbf{x},\psi )= & {} \frac{1}{Th_\mathrm{T}^{d}} \int _0^T \psi (\mathbf{Y}_\mathrm{t})K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt, \end{aligned}$$

and

$$\begin{aligned} \bar{\Psi }_\mathrm{T}(\mathbf{x},\psi )= \frac{1}{Th_\mathrm{T}^{d}} \int _0^T \mathbb {E} \left[ \psi (\mathbf{Y}_\mathrm{t})K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{\mathrm{t}-\delta } \right] dt. \end{aligned}$$

The following simple lemmas will play an instrumental role in the sequel.

Lemma 4.1

Let \((Z_n)_{n\ge 1} \) be a sequence of real martingale differences with respect to the sequence of \(\sigma -\)fields \((\mathcal {F}_n= \sigma (Z_1,\ldots ,Z_n))_{n\ge 1}\), where is the \(\sigma \)-field generated by the random variables \(Z_1,\ldots ,Z_n\). Set

$$\begin{aligned} S_n= \sum _{i=1}^{n} Z_i. \end{aligned}$$

For any \(p \ge 2\) and any \(n\ge 1\), assume that there exist some nonnegative constants C and \(d_n\) such that

$$\begin{aligned} \mathbb {E} \left[ Z_n^p | \mathcal {F}_{n-1}\right] \le C^{p-1} p!\ d_n^2, \quad \text{ almost } \text{ sure }. \end{aligned}$$

Then, for any \(\epsilon >0\), we have

$$\begin{aligned} \mathbb {P} \left( | S_n| > \epsilon \right) \le 2 \exp \left\{ -\frac{\epsilon ^2}{2(D_n+C\epsilon )}\right\} . \end{aligned}$$

where

$$\begin{aligned} D_n = \sum _{i=1}^n d_i^2. \end{aligned}$$

Lemma 4.2

Let \(\Lambda \times \Lambda ^{'}\) be an index set and for each \((\eta ,\eta ^{'}) \in \Lambda \times \Lambda ^{'}\), let \(\{ Z_i(\eta ,\eta ^{'}), i \ge 1\}\), be a sequence of a martingale difference such that \(\left| Z_i(\eta ,\eta ^{'})\right| \le B\) a.s. then, for all \(\epsilon >0\) and all sufficiently large n, we have

$$\begin{aligned} P\left\{ \left| \sum _{i=1}^n Z_i(\eta ,\eta ^{'}) \right| > \epsilon \right\} \le 2 \exp \left\{ -\frac{\epsilon ^2}{2nB^2} \right\} . \end{aligned}$$

The following proposition describes the almost sure consistency of \(\widehat{m}_\mathrm{T}(\mathbf{x},\psi )\) with rate.

Proposition 4.3

Under assumptions (A.1)–(A.4) and (A.6), we have

$$\begin{aligned} \underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| \widehat{m}_\mathrm{T}(\mathbf{x},\psi )-m(\mathbf{x},\psi )\right|= & {} O(h_\mathrm{T}^\beta )+O\left( \left( \frac{\log {T}}{Th_\mathrm{T}^{d}}\right) ^{1/2}\right) , \quad \text{ a.s }. \end{aligned}$$
(4.6)

4.1 Proof of Proposition 4.3.

Making use of conditions (A.2) and (A.3), we infer readily that

$$\begin{aligned}&\underset{\mathbf{x} \in \mathfrak {C}}{\sup }| \widehat{m}_\mathrm{T}(\mathbf{x},\psi )-m(\mathbf{x},\psi ) |\nonumber \\&\quad =\underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| B_\mathrm{T}(\mathbf{x},\psi ) + \frac{ Q_\mathrm{T}(\mathbf{x},\psi )+R_\mathrm{T}(\mathbf{x},\psi )}{f(\mathbf{x})} \frac{f(\mathbf{x})}{f_\mathrm{T}(\mathbf{x})}\right| \nonumber \\&\quad \le \underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| B_\mathrm{T}(\mathbf{x},\psi ) \right| +\frac{1}{\lambda }\ \underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| \frac{ Q_\mathrm{T}(\mathbf{x},\psi ) + R_\mathrm{T}(\mathbf{x},\psi ) }{ \frac{f_\mathrm{T}(\mathbf{x})}{f(\mathbf{x})}}\right| . \end{aligned}$$
(4.7)

Lemma 4.4

Didi and Louani [26] Let \((\mathbf{X}_\mathrm{t})_{\mathrm{t}\ge 0}\) be a strictly stationary and ergodic process, under (A.1) and (A.4), we have then

$$\begin{aligned} \underset{\mathbf{x} \in \mathfrak {C}}{\sup }\left| \frac{f_\mathrm{T}(\mathbf{x})}{f(\mathbf{x})}-1\right| =o_{a.s}(1), \quad \text{ as }\quad T\longrightarrow \infty . \end{aligned}$$
(4.8)

4.2 Proof of Lemma 4.4

Notice that we have the following decomposition

$$\begin{aligned} \frac{f_\mathrm{T}(\mathbf{x}) - f(\mathbf{x})}{f(\mathbf{x})}= & {} \frac{f_\mathrm{T}(\mathbf{x}) -\bar{f}_\mathrm{T}(\mathbf{x}) + \bar{f}_\mathrm{T}(\mathbf{x}) - f(\mathbf{x})}{f(\mathbf{x})}\nonumber \\= & {} \frac{1}{f(\mathbf{x})}\left\{ F_{1,T}(\mathbf{x}) + F_{2,T}(\mathbf{x})\right\} , \end{aligned}$$
(4.9)

where

$$\begin{aligned} \bar{f}_\mathrm{T}(\mathbf{x}) = \frac{1}{Th_\mathrm{T}^{d}} \int _0^T \mathbb {E}\left[ K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta } \right] dt . \end{aligned}$$

Two terms to be investigated, we first take a closer look to the the second term \( F_{2,T}(\mathbf{x})\). We have

$$\begin{aligned} F_{2,T}(\mathbf{x})= & {} \frac{1}{Th_\mathrm{T}^{d}} \int _0^T \mathbb {E}\left[ K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta } \right] dt - f(\mathbf{x})\\= & {} \frac{1}{Th_\mathrm{T}^{d}} \int _0^T \int _{\mathbb {R}^d} K\left( \frac{\mathbf{x}-\mathbf{u}}{h_\mathrm{T}}\right) f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{u}) d\mathbf{u} dt - f(\mathbf{x})\\= & {} \frac{1}{T} \int _0^T \int _{\mathbb {R}^d} K(\mathbf{r}) f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x} - h_\mathrm{T}{} \mathbf{r}) d\mathbf{r} dt - f(\mathbf{x}). \end{aligned}$$

Taylor expansion of \(f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x} - h_\mathrm{T}{} \mathbf{r})\) in neighbourhood of \(\mathbf{x}\) with assumption (A.4)(i), yields

$$\begin{aligned} f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x} - h_\mathrm{T}{} \mathbf{r})= f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x}) + h_\mathrm{T} \nabla f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x}^*), \end{aligned}$$

where \(\mathbf{x}^*\) is between \(\mathbf{x}\) and \(\mathbf{x}-a_\mathrm{T}{} \mathbf{r}\). It follows from assumption (A.4)(i) that

$$\begin{aligned} \left| f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x} - h_\mathrm{T}{} \mathbf{r})- f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x}) \right| \le C h_\mathrm{T}\Vert \mathbf{r}\Vert . \end{aligned}$$

Making use of assumptions (A.1)(iii) and (A.4)(ii), it follows that

$$\begin{aligned} F_{2,T}(\mathbf{x})= & {} C h_\mathrm{T} \int _{\mathbb {R}^d}\Vert \mathbf{r}\Vert K(\mathbf{r}) d\mathbf{r} + \frac{1}{T} \int _0^T f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x}) dt - f(\mathbf{x}) \nonumber \\= & {} o(1),\quad \text{ a.s }. \end{aligned}$$
(4.10)

Now, we will focus on the first term of decomposition (4.9), \(G_{1,T}(\mathbf{x})\), it is clear that

$$\begin{aligned} F_{1,T}(\mathbf{x})= & {} \frac{1}{Th_\mathrm{T}^{d}} \int _0^T \left( K\left( \frac{\mathbf{x}- \mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) - \mathbb {E}\left[ K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta } \right] \right) dt\\= & {} \frac{1}{Th_\mathrm{T}^{d}} \sum _{k=1}^n Z_{\mathrm{T},k}(\mathbf{x}), \end{aligned}$$

where

$$\begin{aligned} T=n\delta , T_k=k\delta \end{aligned}$$

and

$$\begin{aligned} Z_{\mathrm{T},k}(\mathbf{x})= \int _{\mathrm{T}_{k-1}}^{T_k} \left( K\left( \frac{\mathbf{x}- \mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) - \mathbb {E}\left[ K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {G}_{\mathrm{t}-\delta } \right] \right) dt. \end{aligned}$$

We observe that the sequence \(\{Z_{\mathrm{T},k}(\mathbf{x})\}, k=1,\ldots ,n,\) is a sequence of martingale differences with respect to the \(\sigma -\)field

$$\begin{aligned} \mathcal {F}_{k-1}= \sigma (X_s: 0\le s< T_{k-1}). \end{aligned}$$

Under assumption (A.1), the kernel \(K(\cdot )\) is a compactly supported probability function, then we obtain

$$\begin{aligned} \left| Z_{\mathrm{T},k}(\mathbf{x}) \right|\le & {} \int _{\mathrm{T}_{k-1}}^{T_k} \left| K\left( \frac{\mathbf{x}- \mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) - \mathbb {E}\left[ K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta } \right] \right| dt \\\le & {} 2\delta \underset{\mathbf{x}\in \mathfrak {C}}{\sup } |K(\mathbf{x})|\\ {}= & {} 2 \delta \widetilde{K}, \end{aligned}$$

where

$$\begin{aligned} \widetilde{K}= \underset{\mathbf{x}\in \mathfrak {C}}{\sup } |K_1(\mathbf{x})|. \end{aligned}$$

Now, for any \(\epsilon>\), making use of Lemma 4.2 we obtain

$$\begin{aligned} \mathbb {P}\left\{ \left| \sum _{k=1}^n Z_{\mathrm{T},k}(\mathbf{x})\right| > \epsilon (Th_\mathrm{T}^d) \right\}\le & {} 2\exp \left\{ - \frac{\epsilon ^2 (Th_\mathrm{T}^d)^2}{8n \delta ^2 \widetilde{K}^2} \right\} \\= & {} 2\exp \left\{ - \frac{\epsilon ^2 Th_\mathrm{T}^{2d}}{8\delta \widetilde{K}^2} \right\} . \end{aligned}$$

The right-hand side of the last inequality is the general term of a convergent series, hence, for sufficiently large T we conclude by Borel-Cantelli lemma that

$$\begin{aligned} \sum _{n=1}^\infty \mathbb {P}\left\{ \left| \sum _{k=1}^n Z_{\mathrm{T},k}(\mathbf{x})\right| > \epsilon (Th_\mathrm{T}^d) \right\} < \infty , \end{aligned}$$

which means that

$$\begin{aligned} F_{1,T}(\mathbf{x}) = 0, \quad \text{ a.s }. \end{aligned}$$
(4.11)

The proof is achieved by combining the statements (4.10) and (4.11). \(\square \)

The following lemma gives the rate of convergence of \(f_\mathrm{T}(\mathbf{x})\) over a compact set \(\mathfrak {C} \).

Lemma 4.5

Didi and Louani [26] Under assumption (A.1), we have

$$\begin{aligned} \underset{\mathbf{x}\in \mathfrak {C}}{\sup }\left| f_\mathrm{T}(\mathbf{x}) -\bar{f}_\mathrm{T}(\mathbf{x})\right| = O\left( \left( \frac{\log T}{Th_\mathrm{T}^d} \right) ^{1/2}\right) , \quad \text{ a.s. } \end{aligned}$$
(4.12)

4.3 Proof of Lemma 4.5

We refer to the Theorem 1 of Didi and Louani [26]. As in the proof of (4.11) in Lemma 4.4 we obtain the result by using Lemma 4.1 instead of Lemma 4.2. \(\square \)

In order to complete the proof of Proposition 4.3, we will will show Lemma 4.6 and Lemma 4.7 given hereafter.

Lemma 4.6

If hypothesis (A.1)(i), (A.3), (A.4)(i), (A.6)(ii)-(iii) are fulfilled, we have

$$\begin{aligned} \ \underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| \Psi _\mathrm{T}(\mathbf{x},\psi ) -\bar{\Psi }_\mathrm{T}(\mathbf{x},\psi ) \right| =O\left( \left( \frac{\log {T}}{Th_\mathrm{T}^{d}}\right) ^{1/2}\right) , \quad \text{ a.s. } \end{aligned}$$
(4.13)

4.4 Proof of Lemma 4.6

For \(k=1,\ldots ,l\), let \(\mathbf{x}_k, \in \mathfrak {C}\). Consider a covering of the compact set \(\mathfrak {C}\) by a finite number l of spheres \(\mathcal {S}_k\) centered upon by \(\mathbf{x}_k\), with radius

$$\begin{aligned} r=h_\mathrm{T}^{d+q+1}/T, \end{aligned}$$

we have that

$$\begin{aligned} \mathfrak {C}\subset \bigcup _{k=1}^{l}\mathcal {S}_k. \end{aligned}$$

Then we have

$$\begin{aligned} \ \underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| \Psi _\mathrm{T}(\mathbf{x},\psi ) -\bar{\Psi }_\mathrm{T}(\mathbf{x},\psi ) \right|\le & {} \ \underset{1\le k\le l}{\max }\underset{\mathbf{x} \in \mathcal {S}_k}{\sup } \left| \Psi _\mathrm{T}(\mathbf{x},\psi ) - \Psi _\mathrm{T}(\mathbf{x}_k,\psi ) \right| \\&+ \underset{1\le k\le l}{\max } \left| \Psi _\mathrm{T}(\mathbf{x}_k,\psi ) -\bar{\Psi }_\mathrm{T}(\mathbf{x}_k,\psi ) \right| \\&+ \ \underset{1\le k\le l}{\max } \underset{\mathbf{x} \in \mathcal {S}_k}{\sup } \left| \bar{\Psi }_\mathrm{T}(\mathbf{x}_k,\psi ) -\bar{\Psi }_\mathrm{T}(\mathbf{x},\psi ) \right| \\= & {} \Psi _{1,T}(\mathbf{x},\mathbf{x}_k)+ \Psi _{2,T} (\mathbf{x}_k) + \Psi _{3,T}(\mathbf{x},\mathbf{x}_k). \end{aligned}$$

Making use of the Cauchy-Schwarz inequality together with assumption (A.1)(i), (A.3), (A.6)(iv) and Lemma 4.4, we readily obtain

$$\begin{aligned}&\left| \Psi _\mathrm{T}(\mathbf{x},\psi ) - \Psi _\mathrm{T}(\mathbf{x}_k,\psi )\right| \nonumber \\&\quad \le \frac{1}{Th_\mathrm{T}^{d}f_\mathrm{T}(\mathbf{x})} \int _0^T \left| \psi (\mathbf{Y}_\mathrm{t})\right| \left| K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) - K\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \right| dt \nonumber \\&\quad \le \frac{1}{Th_\mathrm{T}^{d}f_\mathrm{T}(\mathbf{x})} \left( \int _0^T \psi ^2(\mathbf{Y}_\mathrm{t}) dt \right) ^{1/2} \left( \int _0^T \left( K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) - K\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \right) ^{2} dt \right) ^{1/2} \nonumber \\&\quad \le \frac{1}{\sqrt{T}h_\mathrm{T}^{d}f_\mathrm{T}(\mathbf{x})} \left( \frac{1}{T}\int _0^T \psi ^2(\mathbf{Y}_\mathrm{t}) dt \right) ^{1/2} \frac{\sqrt{T}C_K}{h_\mathrm{T}} \Vert \mathbf{x}-\mathbf{x}_k\Vert \nonumber \\&\quad \le \frac{C_K }{h_\mathrm{T}^{d+1}f(\mathbf{x})} \Vert \mathbf{x}-\mathbf{x}_k\Vert \ O_{a.s}\left( \mathbb {E}\left[ \psi ^2\left( \mathbf{Y}_0\right) \right] \right) \nonumber \\&\quad \le \frac{C_K }{h_\mathrm{T}^{d+1}\lambda } \Vert \mathbf{x}-\mathbf{x}_k\Vert \ O_{a.s}\left( \mathbb {E}\left[ \psi ^2\left( \mathbf{Y}_0\right) \right] \right) \nonumber \\&\quad \le \frac{C_K }{h_\mathrm{T}^{d+1}\lambda } \frac{h_\mathrm{T}}{T} \ O_{a.s}\left( \mathbb {E}\left[ \psi ^2\left( \mathbf{Y}_0\right) \right] \right) \nonumber \\&\quad = \frac{C_K }{Th_\mathrm{T}^{d}\lambda } \ O_{a.s}\left( \mathbb {E}\left[ \psi ^2\left( \mathbf{Y}_0\right) \right] \right) . \end{aligned}$$
(4.14)

Considering the right hand side of statement (4.14) together with the fact that \(\mathbb {E}\left[ \psi ^2\left( \mathbf{Y}_0\right) \right] < \infty \), we obtain for

$$\begin{aligned} \epsilon _\mathrm{T}=\epsilon _0\left( \log T/Th_\mathrm{T}^{d}\right) ^{1/2}, \end{aligned}$$

that

$$\begin{aligned} \epsilon _\mathrm{T}^{-1} \Psi _{1,T}(\mathbf{x}, \mathbf{x}_k)= O_{a.s}\left( \left( \frac{1}{Th_\mathrm{T}^{d}\log T} \right) ^{1/2}\right) . \end{aligned}$$
(4.15)

Making use of similar arguments as those used for \(\Psi _{1,T}(\mathbf{x}, \mathbf{x}_k)\), we infer that

$$\begin{aligned}&\left| \bar{\Psi }_\mathrm{T}( \mathbf{x})-\bar{\Psi }_\mathrm{T}(\mathbf{x}_k)\right| \\&\quad \le \frac{1}{Th_\mathrm{T}^{d}} \int _0^T \mathbb {E}\left[ \left| \psi (\mathbf{Y}_\mathrm{t})\right| \left| K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) - K\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \right| | \mathcal {F}_{\mathrm{t}-\delta } \right] dt \\&\quad \le \frac{C_K}{Th_\mathrm{T}^{d}} \int _0^T \mathbb {E}\left[ \left| \psi (\mathbf{Y}_\mathrm{t})\right| \left\| \frac{\mathbf{x}-\mathbf{x}_k}{h_\mathrm{T}}\right\| | \mathcal {F}_{\mathrm{t}-\delta } \right] dt\\&\quad \le \left\| \frac{\mathbf{x}-\mathbf{x}_k}{h_\mathrm{T}}\right\| \frac{C_K}{Th_\mathrm{T}^{d+1}} \int _0^T \mathbb {E}\left[ \left| \psi (\mathbf{Y}_\mathrm{t})\right| | \mathcal {F}_{\mathrm{t}-\delta } \right] dt\\&\quad \le \frac{h_\mathrm{T}}{T} \frac{C_K}{h_\mathrm{T}^{d+1}} \left( \frac{1}{T}\int _0^T \mathbb {E}\left[ \left| \psi (\mathbf{Y}_\mathrm{t})\right| | \mathcal {F}_{\mathrm{t}-\delta } \right] dt\right) \\&\quad \le \frac{C_K}{Th_\mathrm{T}^{d}} O\left( \mathbb {E}\left[ \left| \psi (\mathbf{Y}_0)\right| \right] \right) . \end{aligned}$$

Using the fact in (2.5), we get

$$\begin{aligned} \underset{T\rightarrow \infty }{\lim } \frac{1}{T}\int _0^T \mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) | \mathcal {F}_{\mathrm{t}-\delta } \right] dt = \mathbb {E}\left[ \psi (\mathbf{Y}_0) \right] . \end{aligned}$$

This implies that we have

$$\begin{aligned} \epsilon _\mathrm{T}^{-1} \Psi _{3,T}(\mathbf{x},\mathbf{x}_k)= & {} O\left( \left( \frac{1}{Th_\mathrm{T}^{d}\log T} \right) ^{1/2}\right) ,\quad \text{ a.s }. \end{aligned}$$
(4.16)

Now we deal with \(\Psi _{2,T}(\mathbf{x}, y, s_k)\). Observe that

$$\begin{aligned}&\Psi _{2,T}(\mathbf{x}_k)\\&\quad = \underset{1\le k\le l}{\max }\left| \frac{1}{Th_\mathrm{T}^{d}} \int _0^T \psi (\mathbf{Y}_\mathrm{t})\left( K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) - \mathbb {E}\left[ K\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) | \mathcal {F}_{\mathrm{t}-\delta } \right] \right) dt \right| \\&\quad = \frac{1}{Th_\mathrm{T}^{d}} \underset{1\le k\le l_\mathrm{T}}{\max } \left| \sum _{j=1}^n R_{\mathrm{T},j}(\mathbf{x}_k) \right| , \end{aligned}$$

where

$$\begin{aligned} R_{\mathrm{T},j}(\mathbf{x}_k)= \int _{\mathrm{T}_{j-i}}^{T_j} \left( K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) - \mathbb {E}\left[ K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta } \right] \right) dt, \end{aligned}$$

where

$$\begin{aligned} T=n\delta ~ \text{ and } ~T_j= j\delta . \end{aligned}$$

We observe that the sequence \(\big \{ R_{\mathrm{T},j}(\mathbf{x}_k)\big \}_{0\le j \le n}\) is a sequence of martingale differences adapted to the filtration

$$\begin{aligned} \mathcal {F}_{j-1}=\sigma ((X_s,\mathbf{Y}_s): 0\le s< T_{j-1}). \end{aligned}$$

For \(p\ge 2\), making use of Jensen and Minkowski’s inequalities, we get

$$\begin{aligned}&\left| \mathbb {E} \left[ R_{\mathrm{T},j}^p(\mathbf{x}_k) \mid \mathcal {F}_{j-2}\right] \right| \nonumber \\&\quad = \left| \mathbb {E} \left[ \left( \int _{\mathrm{T}_{j-i}}^{T_j} \psi (\mathbf{Y}_\mathrm{t}) \left( K\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) - \mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta } \right] \right) dt \right) ^p\mid \mathcal {F}_{j-2}\right] \right| \nonumber \\&\quad \le \int _{\mathrm{T}_{j-i}}^{T_j}\mathbb {E} \left[ \left| K\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) - \mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t})K\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta } \right] \right| ^p \mid \mathcal {F}_{j-2}\right] dt \nonumber \\&\quad \le \int _{\mathrm{T}_{j-i}}^{T_j}\left( \mathbb {E} \left[ \psi ^p(\mathbf{Y}_\mathrm{t})K^p\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{j-2}\right] ^{1/p}\right. \nonumber \\&\qquad \left. -\mathbb {E} \left[ \mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta } \right] ^p \mid \mathcal {F}_{j-2}\right] ^{1/p} \right) ^p dt \nonumber \\&\quad \le \int _{\mathrm{T}_{j-i}}^{T_j}\left( 2\mathbb {E} \left[ \psi ^p(\mathbf{Y}_\mathrm{t})K^p\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{j-2}\right] ^{1/p} \right) ^p dt \nonumber \\&\quad = 2^p \int _{\mathrm{T}_{j-i}}^{T_j}\mathbb {E} \left[ \psi ^p(\mathbf{Y}_\mathrm{t}) K^p\left( \frac{\mathbf{x}_k-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{j-2}\right] dt, \end{aligned}$$
(4.17)

Furthermore, by assumption (A.6)(iii), we get

$$\begin{aligned}&\mathbb {E} \left[ \left| \psi ^p(\mathbf{Y}_\mathrm{t}) K^p\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \right| \mid \mathcal {F}_{j-2}\right] \\&\quad =\mathbb {E} \left[ \left| \mathbb {E}\left[ \psi ^p(\mathbf{Y}_\mathrm{t}) K^p\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {S}_{\mathrm{t},\delta }\right] \right| \mid \mathcal {F}_{j-2}\right] \\&\quad =\mathbb {E} \left[ \left| \mathbb {E}\left[ \psi ^p(\mathbf{Y}_\mathrm{t}) \mid \mathcal {S}_{\mathrm{t},\delta }\right] K^p\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \right| \mid \mathcal {F}_{j-2}\right] \\&\quad = \mathbb {E} \left[ \left| h_p(\mathbf{X}_\mathrm{t})\right| K^p\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{j-2}\right] \\&\quad \le \mathbb {E} \left[ \left| h_p(\mathbf{X}_\mathrm{t})-h_p(\mathbf{x})\right| K^p\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{j-2}\right] \\&\qquad + \mathbb {E} \left[ \left| h_p(\mathbf{x})\right| K^p\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{j-2}\right] \\&\quad \le \mathbb {E} \left[ K^p\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{j-2}\right] \left( \underset{\Vert \mathbf{x}-\mathbf{u}\Vert \le \lambda h_\mathrm{T}}{\sup }\left| h_p(\mathbf{X}_\mathrm{t})-h_p(\mathbf{x})\right| +\left| h_p(\mathbf{x})\right| \right) \\&\quad \le \eta (\mathbf{x}) \mathbb {E} \left[ K^p\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{j-2}\right] , \end{aligned}$$

where \(\eta (\mathbf{x})\) is a constant. We infer from condition (A.4)(i) that

$$\begin{aligned} \mathbb {E} \left[ K^p\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{j-2}\right]= & {} \int _{\mathbb {R}^{d+1}} K^p\left( \frac{\mathbf{x}-\mathbf{v}}{h_\mathrm{T}}\right) f_\mathrm{T}^{\mathcal {F}_{j-2}}(\mathbf{v}) d\mathbf{v} \nonumber \\= & {} h_\mathrm{T}^{d} \int _{\mathbb {R}^{d+1}} K_1^p(\mathbf{w}) f_\mathrm{T}^{\mathcal {F}_{j-2}}(\mathbf{x}-a_\mathrm{T}{} \mathbf{w}) d\mathbf{w} \nonumber \\\le & {} h_\mathrm{T}^{d} \left\| K \right\| ^p. \end{aligned}$$
(4.18)

Notice that the Eq. (4.17) can be rewritten using Eq. (4.18) as follows

$$\begin{aligned} \mathbb {E} \left[ R_{\mathrm{T},j}^p(\mathbf{x}_k) \mid \mathcal {F}_{j-2}\right]\le & {} 2^p C(\mathbf{x}) \delta h_\mathrm{T}^{d} \left\| K_1 \right\| ^p\nonumber \\\le & {} p! C^{p-2} d_j^2, \end{aligned}$$
(4.19)

where \(C=2\left\| K \right\| \) and

$$\begin{aligned} d_j^2=2\delta C(\mathbf{x}) h_\mathrm{T}^{d} \Vert K \Vert ^2. \end{aligned}$$

Let

$$\begin{aligned} D_n=\sum _{j=1}^n d_j^2=\sum _{j=1}^n 2\delta h_\mathrm{T}^{d} \Vert K \Vert = O(T h_\mathrm{T}^{d}). \end{aligned}$$

An application of Lemma 4.1 and keeping in mind that \(\epsilon _\mathrm{T}=\epsilon _0\left( \log T/Th_\mathrm{T}^{d}\right) ^{1/2} \), we get, for any \(\epsilon _0>0\),

$$\begin{aligned}&\mathbb {P} \left\{ \underset{1\le k\le l}{\max } \left| \sum _{j=1}^n R_{\mathrm{T},j}(\mathbf{x}_k) \right|> \epsilon _\mathrm{T} (Th_\mathrm{T}^{d}) \right\} \\&\quad \le \sum _{k=1}^{l} \mathbb {P} \left\{ \left| \sum _{j=1}^n R_{\mathrm{T},j}(\mathbf{x}_k) \right| > \epsilon _\mathrm{T} (Th_\mathrm{T}^{d}) \right\} \\&\quad \le 2l \exp \left\{ - \frac{\epsilon _\mathrm{T}^2 (Th_\mathrm{T}^{d})^2}{2(D_n+C(Th_\mathrm{T}^{d})\epsilon _\mathrm{T})} \right\} \\&\quad \le 2l \exp \left\{ - \frac{\epsilon _0^2 (Th_\mathrm{T}^{d}) \left( 1/Th_\mathrm{T}^{d}\right) }{O(Th_\mathrm{T}^{d})+2C\left( Th_\mathrm{T}^{d}\right) \epsilon _0\left( \frac{\log T}{Th_\mathrm{T}^{d}}\right) ^{1/2}}\right\} \\&\quad \le 2l \exp \left\{ - \frac{\epsilon _0^2 (Th_\mathrm{T}^{d})^2 \log T/Th_\mathrm{T}^{d}}{O(Th_\mathrm{T}^{d})\left( 1+ \epsilon _0\left( \frac{\log T}{Th_\mathrm{T}^{d}}\right) ^{1/2}\right) }\right\} \\&\quad := 2l\exp \left\{ \log {T^{-2C\epsilon _0^2 C_1}} \right\} \\&\quad = 2lT^{-\epsilon _0^2 C_1 }, \end{aligned}$$

where \(C_1\) is a positive constant. The right-hand side of the previous inequality, is the general term of convergent series, hence for T large enough, we obtain the following statement via the Borel-Cantelli lemma

$$\begin{aligned} \sum _{n=1}^\infty \mathbb {P} \left\{ \underset{1\le k\le l}{\max } \left| \sum _{j=1}^n R_{\mathrm{T},j}(\mathbf{x}_k) \right| > \epsilon _\mathrm{T} (Th_\mathrm{T}^{d})^{-1} \right\} < \infty . \end{aligned}$$

This, in turn, implies that

$$\begin{aligned} \frac{1}{Th_\mathrm{T}^{d}}\underset{1\le k\le l}{\max } \left| \sum _{j=1}^n R_{\mathrm{T},j}(\mathbf{x}_k) \right| =O\left( \left( \frac{\log T}{Th_\mathrm{T}^{d}}\right) ^{1/2}\right) , \quad \text{ a.s }. \end{aligned}$$
(4.20)

By combining (4.15), (4.16) and (4.20) with Lemma 4.4, we obtain that,

$$\begin{aligned} \underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| \Psi _\mathrm{T}(\mathbf{x},\psi ) -\bar{\Psi }_\mathrm{T}(\mathbf{x},\psi ) \right| =O\left( \left( \frac{\log T}{Th_\mathrm{T}^{d}}\right) ^{1/2}\right) , \quad \text{ a.s }. \end{aligned}$$
(4.21)

Therefore the proof is complete. \(\square \)

We next evaluate the term \(B_\mathrm{T}(\mathbf{x},\psi )\) defined in (4.5).

Lemma 4.7

Under assumptions (A.1) and (A.6)(i)–(ii), we have

$$\begin{aligned} \underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| B_\mathrm{T}(\mathbf{x},\psi )\right| =O\left( h_\mathrm{T}^{\beta }\right) . \end{aligned}$$
(4.22)

4.5 Proof of Lemma 4.7

First, we will use the notation

$$\begin{aligned} K_{h_\mathrm{T}}(\cdot )= \frac{1}{h_\mathrm{T}^d} K\left( \frac{\cdot }{h_\mathrm{T}}\right) . \end{aligned}$$

We let

$$\begin{aligned} \underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| B_\mathrm{T}(\mathbf{x},\psi )\right| =\underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| \frac{B_\mathrm{T}^\star (\mathbf{x},\psi )}{\bar{f}_\mathrm{T}(\mathbf{x})}\right| . \end{aligned}$$

Observe that assumption (A.6)(i) implies that

$$\begin{aligned} B_\mathrm{T}^\star (\mathbf{x},\psi )= & {} \bar{\Psi }_\mathrm{T}(\mathbf{x},\psi )- \bar{f}_\mathrm{T}(\mathbf{x}) m(\mathbf{x},\psi )\\= & {} \frac{1}{T}\int _0^T \mathbb {E}\left[ \left( \psi (\mathbf{Y}_\mathrm{t})-m(\mathbf{x},\psi ) \right) K_{h_\mathrm{T}}(\mathbf{x}-\mathbf{X}_\mathrm{t}) \mid \mathcal {F}_{\mathrm{t}-\delta }\right] dt \\= & {} \frac{1}{T}\int _0^T \mathbb {E}\left[ K_{h_\mathrm{T}}(\mathbf{x}-\mathbf{X}_\mathrm{t}) \mathbb {E}\left[ \left( \psi (\mathbf{Y}_\mathrm{t})-m(\mathbf{x},\psi ) \right) \mid \mathcal {S}_{\mathrm{T}-\delta ,\delta } \right] \mid \mathcal {F}_{\mathrm{t}-\delta }\right] dt \\= & {} \frac{1}{T}\int _0^T \mathbb {E}\left[ K_{h_\mathrm{T}}(\mathbf{x}-\mathbf{X}_\mathrm{t}) \left( \mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t})\mid \mathbf{X}_\mathrm{t} \right] - m(\mathbf{x},\psi ) \right) \mid \mathcal {F}_{\mathrm{t}-\delta }\right] dt, \\= & {} \frac{1}{T}\int _0^T \mathbb {E}\left[ K_{h_\mathrm{T}}(\mathbf{x}-\mathbf{X}_\mathrm{t}) \left( m(\mathbf{X}_\mathrm{t},\psi )- m(\mathbf{x},\psi ) \right) \mid \mathcal {F}_{\mathrm{t}-\delta }\right] dt. \end{aligned}$$

Under assumption (A.6)(ii), we have

$$\begin{aligned} |B_\mathrm{T}^\star (\mathbf{x},\psi )|\le & {} \underset{\Vert \mathbf{u}-\mathbf{x}\Vert \le h_\mathrm{T} \lambda }{\sup } \left| m(\mathbf{X}_\mathrm{t},\psi )- m(\mathbf{x},\psi ) \right| \frac{1}{T}\int _0^T \mathbb {E}\left[ K_{h_\mathrm{T}}(\mathbf{x}-\mathbf{X}_\mathrm{t}) \mid \mathcal {F}_{\mathrm{t}-\delta }\right] dt\\\le & {} C_\psi \lambda ^\beta h_\mathrm{T}^\beta \frac{1}{Th_\mathrm{T}^d}\int _0^T \mathbb {E}\left[ K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \mid \mathcal {F}_{\mathrm{t}-\delta }\right] dt \\= & {} C_\psi \lambda ^\beta h_\mathrm{T}^\beta \bar{f}_\mathrm{T}(\mathbf{x}). \end{aligned}$$

We obtain that

$$\begin{aligned} \underset{\mathbf{x} \in \mathfrak {C}}{\sup } \left| B_\mathrm{T}(\mathbf{x},\psi )\right|= & {} O\left( h_\mathrm{T}^\beta \right) ,\quad \text{ a.s. } \end{aligned}$$
(4.23)

The proof of the lemma is therefore completed. \(\square \)

Recalling (4.21), the proof of Theorem 4.3 is completed by combining Lemmas 4.4, 4.5 and 4.7. \(\square \)

In the following lemma, we give the almost sure convergence of \(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}\).

Lemma 4.8

Under the hypotheses of Theorem (2.4), we have, as \(T\rightarrow \infty \),

$$\begin{aligned} \Vert \widehat{{{\varvec{\Theta }} }}_\mathrm{T}-{\varvec{\Theta }}\Vert \overset{a.s}{\longrightarrow } 0. \end{aligned}$$

4.6 Proof of Lemma 4.8

The uniqueness hypothesis of the conditional mode of the regression gives

$$\begin{aligned} \forall \epsilon>0, \exists \eta (\epsilon ) >0; \forall \xi : \Vert {\varvec{\Theta }}- \xi \Vert \ge \epsilon \Rightarrow \left| \widehat{m}_\mathrm{T}({\varvec{\Theta }},\psi )-m(\xi ,\psi ) \right| \ge \eta (\epsilon ) . \end{aligned}$$
(4.24)

Combining conditions (4.24) and (4.1), we obtain, for any fixed \(\mathbf{x} \in \mathfrak {C}\) all \(\epsilon >0\), that there exists a \(\xi >0\) such that

$$\begin{aligned} \mathbb {P}\left\{ \Vert {\varvec{\Theta }}_\mathrm{T}-{\varvec{\Theta }} \Vert \ge \epsilon \right\} \le \mathbb {P}\left\{ \underset{\mathbf{x}\in \mathfrak {C}}{ \sup } |m_\mathrm{T}(\mathbf{x},\psi )- m(\mathbf{x},\psi )|\ge \xi \right\} . \end{aligned}$$
(4.25)

Which gives the desired result provided that the right-hand side of Eq. (4.25) converges almost surely to zero. The proof is therefore completed by using Proposition 4.3. \(\square \)

The following lemma gives the uniform convergence of \(\widehat{m}^{(2)}_\mathrm{T}(\mathbf{x},\psi )\) over the compact set \(\mathfrak {C}\). To simplify our reasoning, from now on, all our will be given in the univariate setting. The extension to the multivariate setting follows easily.

Lemma 4.9

If assumptions (A.1)(ii), (A.3), (A.4)(i), (A.5), (A.6)(i) and (A.7)(i) are fulfilled, we have, as \(T\rightarrow \infty \),

$$\begin{aligned} \underset{\mathbf{x}\in \mathfrak {C}}{\sup }\left\| \widehat{m}^{(2)}_\mathrm{T}(\mathbf{x},\psi )-m^{(2)}(\mathbf{x},\psi )\right\| \longrightarrow 0,\quad \text{ almost } \text{ surely }. \end{aligned}$$
(4.26)

4.7 Proof of Lemma 4.9

We first observe that we have

$$\begin{aligned} \widehat{m}^{(2)}_\mathrm{T}(\mathbf{x},\psi )= & {} \left( \frac{\widehat{M}_\mathrm{T}(\mathbf{x},\psi )}{f(\mathbf{x})}\right) ^{(2)}\\= & {} \frac{\left( \widehat{M}_\mathrm{T}^{(1)}(\mathbf{x},\psi )f(\mathbf{x})-f^{(1)}(\mathbf{x})\widehat{M}_\mathrm{T}(\mathbf{x},\psi )\right) ^{(1)}}{f^2(\mathbf{x})}\\&-\frac{2f^{(1)} (\mathbf{x})\left( \widehat{M}_\mathrm{T}^{(1)}(\mathbf{x},\psi )f(\mathbf{x})-f^{(1)}(\mathbf{x})\widehat{M}_\mathrm{T}(\mathbf{x},\psi )\right) }{f^3(\mathbf{x})} \\= & {} \frac{\widehat{M}_\mathrm{T}^{(2)}(\mathbf{x},\psi )}{f(\mathbf{x})}-\frac{2f^{(1)}(\mathbf{x})\widehat{M}_\mathrm{T}^{(1)}(\mathbf{x},\psi )}{f^2(\mathbf{x})}\\&+\frac{\widehat{M}_\mathrm{T}(\mathbf{x},\psi )\left( 2(f^{(1)}(\mathbf{x}))^2f(\mathbf{x})-f^{(2)}(\mathbf{x})\right) }{f^3(\mathbf{x})}\\= & {} \frac{1}{f(\mathbf{x})}\frac{1}{Th^{d+2}_\mathrm{T}} \int _0^T \psi (\mathbf{Y}_\mathrm{t}) K^{(2)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt\\&-\frac{2f^{(1)} (\mathbf{x})}{f^2(\mathbf{x})}\frac{1}{Th^{d+1}_\mathrm{T}} \int _0^T \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt\\&+\left( \frac{2(f^{(1)}(\mathbf{x}))^2}{f^3(\mathbf{x})}-\frac{f^{(2)}(\mathbf{x})}{f^2(\mathbf{x})}\right) \frac{1}{Th^{d}_\mathrm{T}} \int _0^T \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt. \end{aligned}$$

Let us define

$$\begin{aligned} \widetilde{m}^{(2)}_\mathrm{T}(\mathbf{x},\psi )= & {} \frac{1}{f(\mathbf{x})}\frac{1}{Th^{d+2}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(2)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\\&-\frac{2f^{(1)} (\mathbf{x})}{f^2(\mathbf{x})}\frac{1}{Th^{d+1}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\\&+\left( \frac{2(f^{(1)}(\mathbf{x}))^2f(\mathbf{x})-f^{(2)}(\mathbf{x})}{f^3(\mathbf{x})}\right) \frac{1}{Th^{d}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt. \end{aligned}$$

Consider the following decomposition

$$\begin{aligned} \widehat{m}^{(2)}_\mathrm{T}(\mathbf{x},\psi )-m^{(2)}(\mathbf{x},\psi )= & {} \widehat{m}^{(2)}_\mathrm{T}(\mathbf{x},\psi )-\widetilde{m}^{(2)}_\mathrm{T}(\mathbf{x},\psi )\nonumber \\&~+\widetilde{m}^{(2)}_\mathrm{T}(\mathbf{x},\psi )-m^{(2)}(\mathbf{x},\psi )\nonumber \\= & {} A_{\mathrm{T},1}(\mathbf{x},\psi )+A_{\mathrm{T},2}(\mathbf{x},\psi ). \end{aligned}$$
(4.27)

To achieve the asymptotic uniform convergence over the compact set \(\mathfrak {C}\) of the term \(A_{\mathrm{T},1}(\mathbf{x},\psi )\) in the decomposition (4.27), we have to prove that

$$\begin{aligned}&\underset{\mathbf{x}\in \mathfrak {C}}{\sup } \left\| \frac{1}{Th^{d+2}_\mathrm{T}}\left( \int _0^T \psi (\mathbf{Y}_\mathrm{t}) K^{(2)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt\right. \right. \nonumber \\&\quad \left. \left. - \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(2)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\right) \right\| = o_{a.s}(1), \end{aligned}$$
(4.28)
$$\begin{aligned}&\underset{\mathbf{x}\in \mathfrak {C}}{\sup } \left\| \frac{1}{Th^{d+1}_\mathrm{T}}\left( \int _0^T \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt \right. \right. \nonumber \\&\quad \left. \left. -\int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\right) \right\| =o_{a.s}(1), \end{aligned}$$
(4.29)
$$\begin{aligned}&\underset{\mathbf{x}\in \mathfrak {C}}{\sup } \left\| \frac{1}{Th^{d}_\mathrm{T}}\left( \int _0^T \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt\right. \right. \nonumber \\&\quad \left. \left. - \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}\quad -\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\right) \right\| =o_{a.s}(1), \end{aligned}$$
(4.30)

Using a simple integration par parts and Lemma 4.6, we obtain proof of (4.28)–(4.30), and combining Assumptions (A.1), (A.3), (A.4), (A.6)(i)–(ii) and statements (4.28)–(4.30) we obtain

$$\begin{aligned} \underset{\mathbf{x}\in \mathfrak {C}}{\sup } \Vert A_{\mathrm{T},1}(\mathbf{x},\psi )\Vert = o_{a.s}(1). \end{aligned}$$
(4.31)

Remark that

$$\begin{aligned} m^{(2)}(\mathbf{x},\psi )= & {} \frac{M^{(2)}(\mathbf{x},\psi )}{f(\mathbf{x})}-\frac{2f^{(2)}(\mathbf{x})M^{(1)}_\mathrm{T}(\mathbf{x},\psi )}{f^2(\mathbf{x})}\nonumber \\&+\frac{\left( 2(f^{(1)}(\mathbf{x}))^2f(\mathbf{x})-f^{(2)}(\mathbf{x})\right) M(\mathbf{x},\psi )}{f^3(\mathbf{x})}. \end{aligned}$$
(4.32)

We now treat the second term \(A_{\mathrm{T},2}(\mathbf{x},\psi )\) in (4.27). We have

$$\begin{aligned}&A_{\mathrm{T},2}(\mathbf{x},\psi )\\&\quad = \frac{1}{f(\mathbf{x})}\frac{1}{Th^{d+2}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(2)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\\&\qquad -\frac{2f^{(1)} (\mathbf{x})}{f^2(\mathbf{x})}\frac{1}{Th^{d+1}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\\&\qquad +\left( \frac{2(f^{(1)}(\mathbf{x}))^2f(\mathbf{x})-f^{(2)}(\mathbf{x})}{f^3(\mathbf{x})}\right) \frac{1}{Th^{d}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\\&\qquad -\frac{M^{(2)}(\mathbf{x},\psi )}{f(\mathbf{x})}+\frac{f^{(2)}(\mathbf{x})M^{(1)}_\mathrm{T}(\mathbf{x},\psi )}{f^2(\mathbf{x})}-\frac{\left( 2(f^{(1)}(\mathbf{x}))^2f(\mathbf{x})-f^{(2)}(\mathbf{x})\right) M(\mathbf{x},\psi )}{f^3(\mathbf{x})}\\&\quad = \frac{1}{f(\mathbf{x})}\left( \frac{1}{Th^{d+2}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(2)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt-M^{(2)}(\mathbf{x},\psi )\right) \\&\qquad -\frac{2f^{(1)} (\mathbf{x})}{f^2(\mathbf{x})}\left( \frac{1}{Th^{d+1}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt-M^{(1)}_\mathrm{T}(\mathbf{x},\psi )\right) \\&\qquad + \left( \frac{2(f^{(1)}(\mathbf{x}))^2f(\mathbf{x})-f^{(2)}(\mathbf{x})}{f^3(\mathbf{x})}\right) \left( \frac{1}{Th^{d}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt -M(\mathbf{x},\psi )\right) . \end{aligned}$$

To achieve the asymptotic uniform convergence over the compact set \(\mathfrak {C}\) of the term \(A_{\mathrm{T},2}(\mathbf{x},\psi )\), we have to show the following statements

$$\begin{aligned} \underset{\mathbf{x}\in \mathfrak {C}}{\sup } \left| \frac{1}{Th^{d}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt -M(\mathbf{x},\psi )\right|= & {} o_{a.s}(1), \nonumber \\ \end{aligned}$$
(4.33)
$$\begin{aligned} \underset{\mathbf{x}\in \mathfrak {C}}{\sup } \left\| \frac{1}{Th^{d+1}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt-M^{(1)}_\mathrm{T}(\mathbf{x},\psi )\right\|= & {} o_{a.s}(1), \nonumber \\ \end{aligned}$$
(4.34)
$$\begin{aligned} \underset{\mathbf{x}\in \mathfrak {C}}{\sup } \left\| \frac{1}{Th^{d+2}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(2)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt-M^{(2)}(\mathbf{x},\psi ) \right\|= & {} o_{a.s}(1).\nonumber \\ \end{aligned}$$
(4.35)

Observe that statement (4.33) may be rewritten as follows

$$\begin{aligned}&\underset{\mathbf{x}\in \mathfrak {C}}{\sup } \left| \frac{1}{Th^{d}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt -M(\mathbf{x},\psi )\right| \\&\quad = \underset{\mathbf{x}\in \mathfrak {C}}{\sup } \left| \frac{1}{Th^{d}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt -m(\mathbf{x},\psi )f(\mathbf{x})\right| . \end{aligned}$$

The desired result can be obtained in similar way to the statement (4.23). We have

$$\begin{aligned} M^{(1)}(\mathbf{x},\psi )= & {} \left( m(\mathbf{x},\psi )f(\mathbf{x})\right) ^{(1)} \nonumber \\= & {} m^{(1)}(\mathbf{x},\psi )f(\mathbf{x})+m(\mathbf{x},\psi )f^{(1)}(\mathbf{x}). \end{aligned}$$
(4.36)

On the other hand, under assumption (A.6)(i), we have

$$\begin{aligned}&\frac{1}{Th^{d+1}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\\&\quad =\frac{1}{Th^{d+1}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \big | \mathcal {S}_{\mathrm{t},\delta }\right] \big | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\\&\quad =\frac{1}{Th^{d+1}_\mathrm{T}} \int _0^T\mathbb {E}\left[ m(\mathbf{X}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \big | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\\&\quad = \frac{1}{Th^{d+1}_\mathrm{T}} \int _0^T\int _{\mathbb {R}^d}m(\mathbf{u}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{u}}{h_\mathrm{T}}\right) f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{u})d\mathbf{u}dt. \end{aligned}$$

By integrating by parts we infer that

$$\begin{aligned} U(\mathbf{u})=m(\mathbf{u})f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{u})\rightarrow & {} U^{(1)}(\mathbf{u})=m^{(1)}(\mathbf{u})f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{u})+m(\mathbf{u})\left( f^{ \mathcal {F}_{\mathrm{t}-\delta }}\right) ^{(1)}(\mathbf{u}),\\ V^{(1)}(\mathbf{u})=\frac{1}{h_\mathrm{T}}K^{(1)}\left( \frac{\mathbf{x}-\mathbf{u}}{h_\mathrm{T}}\right)\rightarrow & {} V(\mathbf{u})=-K\left( \frac{\mathbf{x}-\mathbf{u}}{h_\mathrm{T}}\right) . \end{aligned}$$

By integrating by parts and the change of variable \(\mathbf{y} =\frac{\mathbf{x}-\mathbf{u}}{h_\mathrm{T}}\) combined with Taylor expansions of order one, under assumptions (A.4)(i), (A.5) and (A.7)(i), we readily obtain

$$\begin{aligned}&\frac{1}{Th^{d}_\mathrm{T}} \int _0^T\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\nonumber \\&\quad = \frac{1}{Th^{d}_\mathrm{T}} \int _0^T\left( \left[ m(\mathbf{u}) K\left( \frac{\mathbf{x}-\mathbf{u}}{h_\mathrm{T}}\right) f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{u})\right] _{\mathbb {R}^d} \right. \nonumber \\&\qquad \left. + \int _{\mathbb {R}^d} K\left( \frac{\mathbf{x}-\mathbf{u}}{h_\mathrm{T}}\right) \left( m^{(1)}(\mathbf{u})f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{u})+m(\mathbf{u})\left( f^{ \mathcal {F}_{\mathrm{t}-\delta }}\right) ^{(1)}(\mathbf{u})\right) d\mathbf{u} \right) dt\nonumber \\&\quad =\frac{1}{Th^{d}_\mathrm{T}} \int _0^T\int _{\mathbb {R}^d} K\left( \frac{\mathbf{x}-\mathbf{u}}{h_\mathrm{T}}\right) \left( m^{(1)}(\mathbf{u})f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{u})+m(\mathbf{u})\left( f^{ \mathcal {F}_{\mathrm{t}-\delta }}\right) ^{(1)}(\mathbf{u})\right) d\mathbf{u} dt\nonumber \\&\quad =\frac{1}{T} \int _0^T\int _{\mathbb {R}^d} K(\mathbf{y}) \left( m^{(1)}(\mathbf{x})+O(h_\mathrm{T})\right) \left( f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x})+O(h_\mathrm{T}) \right) d\mathbf{y} dt \nonumber \\&\qquad + \frac{1}{T} \int _0^T\int _{\mathbb {R}^d} K(\mathbf{y}) \left( m(\mathbf{x})+O(h_\mathrm{T})\right) \left( \left( f^{ \mathcal {F}_{\mathrm{t}-\delta }}\right) ^{(1)}(\mathbf{x})+O(h_\mathrm{T})\right) d\mathbf{y} dt\nonumber \\&\quad =\left( m^{(1)}(\mathbf{x}) \left( \frac{1}{T} \int _0^Tf^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x}) dt \right) \right. \nonumber \\&\qquad \left. +m(\mathbf{x}) \left( \frac{1}{T} \int _0^T \left( f^{ \mathcal {F}_{\mathrm{t}-\delta }}\right) ^{(1)}(\mathbf{x}) dt \right) \right) \int _{\mathbb {R}^d} K(\mathbf{y})d\mathbf{y} +O(h_\mathrm{T})\nonumber \\&\quad = m^{(1)}(\mathbf{x}) \left( \frac{1}{T} \int _0^T f^{ \mathcal {F}_{\mathrm{t}-\delta }}(\mathbf{x}) dt \right) +m(\mathbf{x}) \left( \frac{1}{T} \int _0^T\left( f^{ \mathcal {F}_{\mathrm{t}-\delta }}\right) ^{(1)} (\mathbf{x}) dt \right) +O(h_\mathrm{T}), \end{aligned}$$
(4.37)

where

$$\begin{aligned} g_i^{\mathcal {F}_{i-2} } (\mathbf{x}) = \left( f^{ \mathcal {F}_{\mathrm{t}-\delta }}\right) ^{(1)}(\mathbf{x}) \end{aligned}$$

is a stationary and ergodic process. Therefore, one have (see Krengel [47], Theorem 4.4),

$$\begin{aligned} \underset{T\rightarrow \infty }{\lim } \underset{\mathbf{x} \in \mathbb {R}^d}{\sup } \left| \frac{1}{T} \int _0^T g_i^{\mathcal {F}_{\mathrm{t}-\delta } } (\mathbf{x})dt - \mathbb {E}\left[ g_1^{\mathcal {F}_{-\delta } } (\mathbf{x}) \right] \right| = 0, \end{aligned}$$
(4.38)

where

$$\begin{aligned} \mathbb {E}\left[ g_1^{\mathcal {F}_{-\delta } } (\mathbf{x}) \right] = f^{(1)} (\mathbf{x}). \end{aligned}$$

By combining the statements (4.37) and (4.38) we conclude proof of (4.34). Moreover, statement (4.35) may be proved in the same way as in statement (4.34), keeping in mind that

$$\begin{aligned} M^{(2)}(\mathbf{x},\psi )= & {} \left( m^{(1)}(\mathbf{x},\psi )f(\mathbf{x})+m(\mathbf{x},\psi )f^{(1)}(\mathbf{x})\right) ^{(2)}\\= & {} m^{(2)}(\mathbf{x},\psi )f(\mathbf{x})+2m^{(1)}(\mathbf{x},\psi )f^{(1)}(\mathbf{x})\\&+m(\mathbf{x},\psi )f^{(2)}(\mathbf{x}). \end{aligned}$$

By applying integration by parts twice, we obtain (4.34). Combining statement (4.33), (4.34) and (4.35), yields to

$$\begin{aligned} \underset{\mathbf{x}\in \mathfrak {C}}{\sup } \left| A_{\mathrm{T},2}(\mathbf{x},\psi )\right| = o_{a.s}(1). \end{aligned}$$
(4.39)

Statements (4.31) and (4.39) complete the proof of Lemma 4.9. \(\square \)

4.8 Proof of Theorem 2.4

Under assumption (A.3)(ii) and using Taylor expansion of \(m({\varvec{\Theta }}_\mathrm{T},\psi )\) around \(\Theta \) we obtain

$$\begin{aligned} m({\varvec{\Theta }}_\mathrm{T},\psi )= m({\varvec{\Theta }},\psi ) + ({\varvec{\Theta }}_\mathrm{T}- {\varvec{\Theta }}) m^{(2)}({\varvec{\Theta }}_\mathrm{T}^\star ,\psi )({\varvec{\Theta }}_\mathrm{T}- {\varvec{\Theta }}), \end{aligned}$$
(4.40)

where \({\varvec{\Theta }}_\mathrm{T}^\star \) is between \({\varvec{\Theta }}_\mathrm{T}\) and \({\varvec{\Theta }}\), it follows from equations (4.1) and (4.40) that

$$\begin{aligned} \Vert {\varvec{\Theta }}_\mathrm{T}- {\varvec{\Theta }}\Vert ^{2} \left\| m^{(2)}({\varvec{\Theta }}_\mathrm{T}^\star ,\psi ) \right\| = O \left( \underset{\mathbf{x}\in \mathfrak {C}}{\sup }\left| \widehat{m}_\mathrm{T}(\mathbf{x},\psi )-m(\mathbf{x},\psi ) \right| \right) . \end{aligned}$$

Using Lemma 4.8 and condition (A.7)(ii), one obtains

$$\begin{aligned} \underset{T\rightarrow \infty }{\lim } \left\| m^{(2)}({\varvec{\Theta }}_\mathrm{T}^\star ,\psi ) \right\| = \left\| m^{(2)}({\varvec{\Theta }},\psi ) \right\| \ne 0. \end{aligned}$$

Therefore,

$$\begin{aligned} \Vert {\varvec{\Theta }}_\mathrm{T}- {\varvec{\Theta }}\Vert ^2 = O \left( \underset{\mathbf{x}\in \mathfrak {C}}{\sup }\left| \widehat{m}_\mathrm{T}(\mathbf{x},\psi )-m(\mathbf{x},\psi ) \right| \right) , \end{aligned}$$
(4.41)

which is enough, while considering Proposition 4.3, to complete the proof. \(\square \)

4.9 Proof of Theorem 2.5

By using formula (2.4), we readily obtain

$$\begin{aligned} \sqrt{Th_\mathrm{T}^{d+2}}\ \ \widehat{m}_\mathrm{T}^{(1)}({\varvec{\Theta }},\psi ) = \sqrt{Th_\mathrm{T}^{d+2}}\ \ (\widehat{{{\varvec{\Theta }} }}_\mathrm{T} - {\varvec{\Theta }}) \ \ \widehat{m}_\mathrm{T}^{(2)}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}^\star ,\psi ), \end{aligned}$$

where \(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}^\star \) is a random variable taking his values between \({\varvec{\Theta }}\) and \( \widehat{{{\varvec{\Theta }} }}_\mathrm{T}\). From the hypothesis made on \(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}\) it results that \(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}^\star \) also converges a.s. towards \({\varvec{\Theta }}\). The continuity of function \(m^{(2)}(\cdot ,\psi )\) leads

$$\begin{aligned} \underset{T\rightarrow \infty }{\lim } m^{(2)}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}^\star ,\psi ) = m^{(2)}({\varvec{\Theta }},\psi ). \end{aligned}$$

For T large enough, we have almost surely

$$\begin{aligned} \left| \widehat{m}_\mathrm{T}^{(2)}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}^\star ,\psi ) -m^{(2)}({\varvec{\Theta }},\psi ) \right|\le & {} \underset{x\in \mathfrak {C}}{\sup } \left| \widehat{m}_\mathrm{T}^{(2)}(\mathbf{x},\psi ) -m^{(2)}(\mathbf{x},\psi ) \right| \\&+\left| m^{(2)}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}^\star ,\psi ) -m^{(2)}({\varvec{\Theta }},\psi ) \right| . \end{aligned}$$

The uniform convergence in probability \( \widehat{m}_\mathrm{T}^{(2)}(\cdot ,\psi )\) to \(m^{(2)}(\cdot ,\psi )\) over \(\mathfrak {C}\) implies the convergence of the sequence \( \widehat{m}_\mathrm{T}^{(2)}(\widehat{{{\varvec{\Theta }} }}_\mathrm{T}^\star ,\psi )\) in probability to the non-null real \(m^{(2)}({\varvec{\Theta }},\psi )\). The conclusion results from the asymptotic normality of \(m_\mathrm{T}^{(1)}({\varvec{\Theta }},\psi )\). Since

$$\begin{aligned} \underset{T\rightarrow \infty }{\lim } f_\mathrm{T}(\mathbf{x})=f(\mathbf{x}) \end{aligned}$$

almost surely and uniformly on the set \(\mathfrak {C}\), refer for details [26]. Notice that we have

$$\begin{aligned} \widehat{m}_\mathrm{T}^{(1)}(\mathbf{x},\psi )= & {} \left( \frac{M_\mathrm{T}(\mathbf{x},\psi )}{f(\mathbf{x})}\right) ^{(1)}\nonumber \\= & {} \frac{M_\mathrm{T}^{(1)}(\mathbf{x},\psi )f(\mathbf{x})-f^{(1)}(\mathbf{x})M_\mathrm{T}(\mathbf{x},\psi )}{f^2(\mathbf{x})}\nonumber \\= & {} \frac{1}{f^2(\mathbf{x})}\left( \frac{f(\mathbf{x})}{Th_\mathrm{T}^{d+1}} \int _0^T \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt \right. \nonumber \\&\left. -\frac{f^{(1)}(\mathbf{x})}{Th_\mathrm{T}^{d}}\int _0^T \psi (\mathbf{Y}_s) K\left( \frac{\mathbf{x}-\mathbf{X}_s}{h_\mathrm{T}}\right) ds\right) , \end{aligned}$$
(4.42)

where

$$\begin{aligned} \widehat{m}_\mathrm{T}^{(1)}(\mathbf{x},\psi )= & {} \frac{1}{Th_\mathrm{T}^{d+1}f^2(\mathbf{x})}\left[ f(\mathbf{x}) \int _0^T \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt \right. \\&\left. -h_\mathrm{T}f^{(1)}(\mathbf{x})\int _0^T \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt\right] , \end{aligned}$$

and

$$\begin{aligned} \widetilde{m}_\mathrm{T}^{(1)}(\mathbf{x},\psi )= & {} \frac{1}{Th_\mathrm{T}^{d+1}f^2(\mathbf{x})}\left[ f(\mathbf{x}) \sum _{i=1}^n \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{i-2}\right] dt \right. \\&\left. -h_\mathrm{T}f^{(1)}(\mathbf{x})\int _0^T \mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{\mathrm{t}-\delta }\right] dt\right] . \end{aligned}$$

We will make use of the following additional notation

$$\begin{aligned} W_{i}(\mathbf{x},\psi )= & {} \frac{(f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x}))}{(Th_\mathrm{T}^{d})^{1/2}f^2(\mathbf{x})}\int _{\mathrm{T}_{i-1}}^{T_i} \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) dt,\\ \Delta _i(\mathbf{x},\psi )= & {} \left( W_{i}(\mathbf{x},\psi ) - \mathbb {E}\left[ W_{i}(\mathbf{x},\psi )| \mathcal {F}_{i-2 }\right] \right) ,\\ \sigma ^2(\mathbf{x},\psi )= & {} \frac{\Psi _2(\mathbf{x},\psi )}{f(\mathbf{x})}\int _{\mathbb {R}^{d} }\left[ K^{(1)}(\mathbf{u}) \right] ^2d\mathbf{u}, \end{aligned}$$

where

$$\begin{aligned} \Psi _2(\mathbf{x},\psi )=\mathbb {E}(|\psi ^2(\mathbf{Y})|\vert \mathbf{X}=\mathbf{x}). \end{aligned}$$

Observe that

$$\begin{aligned}&(Th_\mathrm{T}^{d+2})^{1/2} \left( \widehat{m}_\mathrm{T}^{(1)}(\mathbf{x},\psi )-\widetilde{m}_\mathrm{T}^{(1)}(\mathbf{x},\psi )\right) \nonumber \\&\quad =\frac{(Th_\mathrm{T}^{d+2})^{1/2}}{Th_\mathrm{T}^{d+1}}\frac{(f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x}))}{f^2(\mathbf{x})} \nonumber \\&\qquad \times \sum _{i=1}^n \int _{\mathrm{T}_{i-1}}^{T_i}\left( \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) -\mathbb {E}\left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) | \mathcal {F}_{i-2 }\right] \right) dt \nonumber \\&\quad =\sum _{i=1}^n \Delta _i(\mathbf{x},\psi ). \end{aligned}$$
(4.43)

Lemma 4.10, stated below, will play an instrumental role in the proof of Theorem 2.5.

Lemma 4.10

Under assumptions (A.1), (A.3)(i)–(ii), (A.4) and (A.6), as \(n \rightarrow \infty \), we have

$$\begin{aligned} \sum _{i=1}^n \Delta _i(\mathbf{x},\psi )=\sum _{i=1}^n\left( W_{i}(\mathbf{x},\psi ) - \mathbb {E}\left[ W_{i}(\mathbf{x},\psi )| \mathcal {F}_{i-2 }\right] \right) \overset{D}{\rightarrow } N(0,\sigma ^2(\mathbf{x},\psi )). \end{aligned}$$

4.10 Proof of Lemma 4.10

It is easily seen that \((W_{i}(\mathbf{x},\psi ))_{1\le i\le n}\) is a sequence of martingale differences with respect to the sequence of \(\sigma -\)fields \((\mathcal {F}_{i-1})_{1\le i\le n}\). Therefore, we have to check the following two conditions.

  1. (a)
    $$\begin{aligned} \sum _{i=1}^n \mathbb E\left[ \Delta _{i}^2(\mathbf{x},\psi )| \mathcal {F}_{ i-2}\right] \overset{\mathbb {P}}{\rightarrow } \sigma ^2( \mathbf{x},\psi ); \end{aligned}$$
  2. (b)
    $$\begin{aligned} n \mathbb E\left[ \Delta _{i}^2(\mathbf{x},\psi ) \mathbb {1}_{\{|\Delta _{i}(\mathbf{x},\psi )|> \epsilon \}}\right] = o(1)~~\text{ holds, } \text{ for } \text{ any } ~~\epsilon >0. \end{aligned}$$

These conditions are necessary to establish the asymptotic normality related to discrete time martingale difference sequences (see, for instance, Hall and Heyde [37]).

4.11 Proof of (a)

First, observe that

$$\begin{aligned} \left| \sum _{i=1}^n \mathbb {E} \left[ W_{i}^2(\mathbf{x},\psi ) \big | \mathcal {F}_{i-2} \right] - \sum _{i=1}^n \mathbb {E} \left[ \Delta ^2_{\mathrm{T},i}(\mathbf{x},\psi ) \big | \mathcal {F}_{i-2} \right] \right| = \left| \sum _{i=1}^n \left( \mathbb {E} \left[ W_{i}(\mathbf{x},\psi ) \big | \mathcal {F}_{i-2} \right] \right) ^2\right| . \end{aligned}$$

Note that, for \(T_{i-1}\le t\le T_i\), we have \(\mathcal {S}_{\mathrm{T}-\delta ,\delta }\subset \mathcal {F}_{i-2}\). Therefore, making use of condition (A.6)(i), we obtain

$$\begin{aligned}&\left| \mathbb {E} \left[ W_{i}(\mathbf{x},\psi )\big | \mathcal {F}_{i-2} \right] \right| \\&\quad = \left| \frac{1}{(Th_\mathrm{T}^{d})^{1/2}}\frac{(f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x}))}{f^2(\mathbf{x})} \mathbb {E} \left[ \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}-\mathbf{X}_\mathrm{t}}{h_\mathrm{T}}\right) \big | \mathcal {F}_{i-2} \right] dt \right| \\&\quad \le \frac{1}{(Th_\mathrm{T}^{d})^{1/2}}\frac{(f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x}))}{f^2(\mathbf{x})} \int _{\mathrm{T}_{i-1} }^{T_i} \mathbb {E} \left[ \left| \psi (\mathbf{Y}_\mathrm{t}) \right| K^{(1)}\left( \frac{ \mathbf{x} - \mathbf{X}_\mathrm{T} }{h_\mathrm{T}}\right) \big | \mathcal {F}_{i-2} \right] dt\\&\quad \le \frac{\mathcal {M}_\psi }{(Th_\mathrm{T}^{d})^{1/2}}\frac{(f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x}))}{f^2(\mathbf{x})} \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E} \left[ K^{(1)}\left( \frac{ \mathbf{x} - \mathbf{X}_\mathrm{T} }{h_\mathrm{T}}\right) \big | \mathcal {F}_{i-2} \right] dt. \end{aligned}$$

Using Taylor’s formula combined with assumption (A.4), we obtain

$$\begin{aligned}&\left| \sum _{i=1}^n \left( \mathbb {E} \left[ W_{i}(\mathbf{x},\psi )\big | \mathcal {F}_{i-2} \right] \right) ^2\right| \\&\quad \le \left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2\frac{\mathcal {M}_\psi ^2}{Th_\mathrm{T}^{d}} \sum _{i=1}^n\left( \int _{\mathrm{T}_{i-1}}^{T_i} \int _{\mathbb {R}^d} K^{(1)}\left( \frac{\mathbf{x} - \mathbf{y}}{h_\mathrm{T}} \right) f_\mathrm{T}^{\mathcal {F}_{i-2} } (\mathbf{y}) d\mathbf{y} dt\right) ^2\\&\quad =\left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2\frac{\mathcal {M}_\psi ^2h_\mathrm{T}^{d}}{T} \sum _{i=1}^n\left( \int _{\mathrm{T}_{i-1}}^{T_i} \int _{\mathbb {R}^d} K^{(1)}( \mathbf{z}) f_\mathrm{T}^{\mathcal {F}_{i-2} } (\mathbf{x} -h_\mathrm{T}{} \mathbf{z}) d\mathbf{z} dt\right) ^2 \\&\quad = \left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2 \frac{\mathcal {M}_\psi ^2h_\mathrm{T}^{d}}{\delta } \left( \frac{1}{n} \sum _{i=1}^n \left( \int _{\mathrm{T}_{i-1}}^{T_i} f_\mathrm{T}^{\mathcal {F}_{i-2} } (\mathbf{x}) dt \right) ^2+O(h_\mathrm{T}) \right) \\&\quad :=\left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2 \frac{\mathcal {M}_\psi ^2h_\mathrm{T}^{d}}{\delta } \frac{1}{ n} \sum _{i=1}^n \left( g_i^{\mathcal {F}_{i-2}} (\mathbf{x})\right) ^2 + O\left( h_\mathrm{T}^{d} \right) \\&\quad =O\left( h_\mathrm{T}^{d} \right) , \end{aligned}$$

where

$$\begin{aligned} g_i^{\mathcal {F}_{i-2} } (\mathbf{x}) = \left( \int _{\mathrm{T}_{i-1}}^{T_i} f_\mathrm{T}^{\mathcal {F}_{i-2} } (\mathbf{x}) dt \right) ^2, \end{aligned}$$

is a stationary and ergodic process. So the sum \( \frac{1}{n} \sum _{i=1}^n g_i^{\mathcal {F}_{i-2} } (\mathbf{x})\) has a finite limit, (see Krengel [47, Theorem 4.4]), which is

$$\begin{aligned} \mathbb {E}\left[ g_1^{\mathcal {F}_{-\delta } } (\mathbf{x}) \right] =g_1(\mathbf{x}) =\left( \int ^{\delta }_{0} f_\mathrm{T} (\mathbf{x}) dt \right) ^2 = \delta ^2 f^2(\mathbf{x}) . \end{aligned}$$
(4.44)

Moreover, observe that by assumptions (A.3), we have

$$\begin{aligned} \frac{(f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x}))}{f^2(\mathbf{x})}= \frac{1}{f(\mathbf{x})}+O(h_\mathrm{T})= \frac{1}{f(\mathbf{x})}+o(1). \end{aligned}$$

Using Jensen inequality and Assumption (A.6)(iii), we obtain

$$\begin{aligned}&\sum _{i=1}^n \mathbb {E} \left[ W_{i}^2(\mathbf{x},\psi ) \big | \mathcal {F}_{i-2} \right] \\&\quad = \left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2\frac{1}{Th_\mathrm{T}^{d}} \mathbb {E} \left[ \left( \int _{\mathrm{T}_{i-1}}^{T_i} \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) dt \right) ^2\big | \mathcal {F}_{i-2} \right] \\&\quad \le \left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2\frac{1}{Th_\mathrm{T}^{d}} \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E} \left[ \mathbb {E} \left[ \psi ^2(\mathbf{Y}_\mathrm{t}) \left( K^{(1)} \left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right) ^2\big | \mathcal {S}_{\mathrm{t},\delta } \right] \big | \mathcal {F}_{i-2} \right] dt \\&\quad = \left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2 \frac{1}{Th_\mathrm{T}^{d}} \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E} \left[ \Phi _2(\mathbf{X}_\mathrm{t},\psi ) \left( K^{(1)} \left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right) ^2\big | \mathcal {F}_{i-2} \right] dt. \end{aligned}$$

According to Assumption (A.6)(iii), the function \(\Phi _2(\cdot ,\psi )\) is continuous in the neighbourhood of \(\mathbf{x}\), we have

$$\begin{aligned}&\sum _{i=1}^n \mathbb {E} \left[ W_{i}^2(\mathbf{x},\psi ) \big | \mathcal {F}_{i-2} \right] \\&\quad \le \left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2\frac{1}{Th_\mathrm{T}^{d}} \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E} \left[ |\Phi _2(\mathbf{X}_\mathrm{t},\psi )-\Phi _2(\mathbf{x},\psi )| \left( K^{(1)} \left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right) ^2\big | \mathcal {F}_{i-2} \right] dt\\&\qquad +\left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2\frac{1}{Th_\mathrm{T}^{d}} \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E} \left[ \Phi _2(\mathbf{x},\psi ) \left( K^{(1)} \left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right) ^2\big | \mathcal {F}_{i-2} \right] dt\\&\quad \le \left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2\frac{1}{Th_\mathrm{T}^{d}} \underset{\parallel \mathbf{x}-\mathbf{v}\parallel \le h_\mathrm{T}}{\sup }|\Phi _2(\mathbf{v},\psi )-\Phi _2(\mathbf{x},\psi )|\\&\qquad \times \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E} \left[ \left( K^{(1)} \left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right) ^2\big | \mathcal {F}_{i-2} \right] dt\\&\qquad +\left( \frac{f(\mathbf{x})-h_\mathrm{T}f^{(1)}(\mathbf{x})}{f^2(\mathbf{x})}\right) ^2\frac{\Phi _2(\mathbf{x},\psi ) }{Th_\mathrm{T}^{d}} \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E} \left[ \left( K^{(1)} \left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right) ^2\big | \mathcal {F}_{i-2} \right] dt\\&\quad =\frac{ 1}{Th_\mathrm{T}^{d}f^2(\mathbf{x})}\left( \Phi _2(\mathbf{x},\psi )+o(h_\mathrm{T}) \right) \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E} \left[ \left( K^{(1)} \left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right) ^2\big | \mathcal {F}_{i-2} \right] dt. \end{aligned}$$

By a first order Taylor expansion of the function \(f_{\mathrm{T},T{i-2}}\), for \(\mathbf{x}^*\) in \([\mathbf{x}-h_\mathrm{T}{} \mathbf{v},\mathbf{x}]\), we obtain

$$\begin{aligned}&\sum _{i=1}^n \mathbb {E} \left[ W_{i}^2(\mathbf{x},\psi ) \big | \mathcal {F}_{i-2} \right] \\&\quad \le \frac{\Phi _2(\mathbf{x},\psi )}{Th_\mathrm{T}^{d}f^2(\mathbf{x})} \int _{\mathrm{T}_{i-1}}^{T_i} \mathbb {E} \left[ \left( K^{(1)} \left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right) ^2\big | \mathcal {F}_{i-2} \right] dt \\&\quad =\frac{\Phi _2(\mathbf{x},\psi )}{Th_\mathrm{T}^{d}f^2(\mathbf{x})} \sum _{i=1}^n \int _{\mathrm{T}_{i-1}}^{T_i} \int _{\mathbb {R}^{d}} \left( K^{(1)}\right) ^2\left( \frac{\mathbf{x} - \mathbf{u}}{h_\mathrm{T}} \right) f_\mathrm{T}^{\mathcal {F}_{\mathrm{T}_{i-2}} } (\mathbf{u}) d\mathbf{u} dt \\&\quad = \frac{\Phi _2(\mathbf{x},\psi )}{Tf^2(\mathbf{x})} \sum _{i=1}^n \int _{\mathrm{T}_{i-1}}^{T_i} \int _{\mathbb {R}^{d} } \left( K^{(1)}\right) ^2(\mathbf{v})f_\mathrm{T}^{\mathcal {F}_{\mathrm{T}_{i-2}} } (\mathbf{x}-h_\mathrm{T}{} \mathbf{v}) d\mathbf{v} dt\\&\quad = \frac{\Phi _2(\mathbf{x},\psi )}{\delta f^2(\mathbf{x})} \left( \frac{1}{n} \sum _{i=1}^n \int _{\mathrm{T}_{i-1}}^{T_i} f_\mathrm{T}^{\mathcal {F}_{\mathrm{T}_{i-2}} } (\mathbf{x}) dt+ O(h_\mathrm{T}) \right) \int _{\mathbb {R}^{d} } \left( K^{(1)}\right) ^2(\mathbf{v}) d\mathbf{v}. \end{aligned}$$

It is clear, whenever \(\delta \) is small enough, that the quantities

$$\begin{aligned} \left( \int _{\mathrm{T}_{i-1}}^{T_i} f_\mathrm{T}^{\mathcal {F}_{\mathrm{T}_{i-2}} } (\mathbf{x} ) dt \right) _{i\in \mathbb {N}} \end{aligned}$$

may be approximated by

$$\begin{aligned} \left( \delta f_{\mathrm{T}_{i-1}}^{\mathcal {F}_{\mathrm{T}_{i-2}} } (\mathbf{x} )\right) _{i\in \mathbb {N}}. \end{aligned}$$

Consequently, using the ergodic and stationarity properties of the process \((\mathbf{X}_\mathrm{t})_{\mathrm{T} \ge 0}\), it follows that

$$\begin{aligned} \frac{1}{n} \sum _{j=1}^n \left( \int _{\mathrm{T}_{j-1}}^{T_j} f_\mathrm{T}^{\mathcal {F}_{\mathrm{T}_{i-2}} } (\mathbf{x}) dt \right)= & {} \mathbb E \left( \int _{\mathrm{T}_0}^{T_1} f_\mathrm{T}(\mathbf{x}) dt \right) + o(1)\\= & {} \int _0^{\delta } \mathbb E \left( f_\mathrm{T}(\mathbf{x}) \right) dt + o(1)\\= & {} \delta f(\mathbf{x}) + o(1). \end{aligned}$$

It follows that

$$\begin{aligned} \sum _{i=1}^n \mathbb {E} \left[ W_{i}^2(\mathbf{x},\psi ) \big | \mathcal {F}_{i-2} \right]= & {} \frac{\Phi _2(\mathbf{x},\psi )}{ f(\mathbf{x})}\int _{\mathbb {R}^d} \left( K^{(1)}\right) ^2(\mathbf{y}) d\mathbf{y} +O(h_\mathrm{T}). \end{aligned}$$

This implies that

$$\begin{aligned} \sum _{i=1}^n \mathbb {E} \left[ W_{i}^2(\mathbf{x},\psi ) \big | \mathcal {F}_{i-2} \right]= & {} \frac{\Phi _2(\mathbf{x},\psi )}{ f(\mathbf{x})}\int _{\mathbb {R}^d} \left( K^{(1)}\right) ^2(\mathbf{y}) d\mathbf{y},\ \text{ as }\ T\rightarrow \infty . \end{aligned}$$
(4.45)

4.12 Proof of (b)

Using the inequalities of Holder, Markov, Jensen and Minkowski’s inequalities, together with Assumption (A.6)(iii), we obtain for all \(\epsilon >0\) and all p and q, such that

$$\begin{aligned} \frac{1}{p}+\frac{1}{q}=1, \end{aligned}$$

that

$$\begin{aligned}&\mathbb E[\Delta _{\mathrm{T},i}^2(\mathbf{x}) \mathbb {1}_{\{|\Delta _{\mathrm{T},i}(\mathbf{x})|> \epsilon \}}]\\&\quad \le ( \mathbb E[\Delta _{\mathrm{T},i}^{2q}(\mathbf{x})])^{1/q} (P\{|\Delta _{\mathrm{T},i}(\mathbf{x})| > \epsilon \})^{1/p} \\&\quad \le \epsilon ^{-2q/p}\mathbb E[|\Delta _{\mathrm{T},i}(\mathbf{x} )|^{2q}]\\&\quad =\frac{ \epsilon ^{-2q/p}}{(Th_\mathrm{T}^d)^qf^{2q}(\mathbf{x})}\mathbb {E}\left[ \int _{\mathrm{T}_{i-1}}^{T_i} \left| \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}- \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) -\mathbb E\left[ \psi (\mathbf{Y}_\mathrm{t})K^{(1)}\left( \frac{\mathbf{x} - \mathbf{X}_\mathrm{t} }{h_\mathrm{T}}\right) | \mathcal {F}_{i-2}\right] \right| ^{2q} dt \right] \\&\quad \le \frac{2^{2q} \epsilon ^{-2q/p}}{(Th_\mathrm{T}^d)^qf^{2q} (\mathbf{x})}\int _{\mathrm{T}_{i-1}}^{T_i}\mathbb {E}\left[ \left| \psi (\mathbf{Y}_\mathrm{t}) K^{(1)}\left( \frac{\mathbf{x}- \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right| ^{2q} \right] dt\\&\quad = \frac{ 2^{2q}\epsilon ^{-2q/p}}{(Th_\mathrm{T}^d)^qf^{2q} (\mathbf{x})}\int _{\mathrm{T}_{i-1}}^{T_i}\mathbb {E}\left[ \mathbb {E}\left[ \left| \psi (\mathbf{Y}_\mathrm{t})\right| ^{2q}| \mathcal {S}_{\mathrm{t},\delta }\right] \left( K^{(1)}\right) ^{2q}\left( \frac{\mathbf{x}- \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right] dt\\&\quad = \frac{ 2^{2q}\epsilon ^{-2q/p}}{(Th_\mathrm{T}^d)^qf^{2q} (\mathbf{x})}\int _{\mathrm{T}_{i-1}}^{T_i}\mathbb {E}\left[ h_{2q}(\mathbf{X}_\mathrm{t},\psi ) \left( K^{(1)}\right) ^{2q}\left( \frac{\mathbf{x}- \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right] dt\\&\quad \le \frac{ 2^{2q}\epsilon ^{-2q/p}}{(Th_\mathrm{T}^d)^qf^{2q} (\mathbf{x})}\left( \underset{\parallel \mathbf{x}-\mathbf{v}\parallel \le h_\mathrm{T}}{\sup }\left| h_{2q}(\mathbf{v},\psi )-h_{2q}(\mathbf{x},\psi )\right| +h_{2q}(\mathbf{x},\psi ) \right) \\&\qquad \times \int _{\mathrm{T}_{i-1}}^{T_i}\mathbb {E}\left[ \left( K^{(1)}\right) ^{2q}\left( \frac{\mathbf{x}- \mathbf{X}_\mathrm{t}}{h_\mathrm{T}} \right) \right] dt\\&\quad = \frac{ 2^{2q}\epsilon ^{-2q/p}}{(Th_\mathrm{T}^d)^qf^{2q} (\mathbf{x})}\left( h_{2q}(\mathbf{x},\psi ) +o(1)\right) \int _{\mathrm{T}_{i-1}}^{T_i}\int _{\mathbb {R^d}}\left( K^{(1)}\right) ^{2q}\left( \frac{\mathbf{x}- \mathbf{u}}{h_\mathrm{T}} \right) f(\mathbf{u}) d\mathbf{u} dt\\&\quad =\frac{ 2^{2q}\delta \epsilon ^{-2q/p}}{(Th_\mathrm{T}^d)^qf^{2q} (\mathbf{x})}\left( h_{2q}(\mathbf{x},\psi ) +o(1)\right) \int _{\mathbb {R^d}}\left( K^{(1)}\right) ^{2q}\left( \mathbf{u} \right) f(\mathbf{x}-h_\mathrm{T}{} \mathbf{u}) d\mathbf{u}. \end{aligned}$$

By a first order Taylor’s expansion, we have

$$\begin{aligned}&\mathbb {E}[\Delta _{\mathrm{T},i}^2(\mathbf{x}) \mathbb {1}_{\{|\Delta _{\mathrm{T},i}(\mathbf{x})| > \epsilon \}}]\nonumber \\&\quad = \frac{2^{2q} \epsilon ^{-2q/p}\left\| \left( K^{(1)}\right) ^{2q}\left( \mathbf{v} \right) \right\| _\infty }{(Th_\mathrm{T}^{d})^{(q-1)}f^{2q-1} (\mathbf{x})}\left( h_{2q}(\mathbf{x},\psi ) +o(1)\right) \nonumber \\&\quad =o(1). \end{aligned}$$
(4.46)

Combining statements (4.45) and (4.46), we obtain

$$\begin{aligned} (Th_\mathrm{T}^{d+2})^{1/2} \left( \widehat{m}_\mathrm{T}^{(1)}(\mathbf{x},\psi )- \widetilde{m}_\mathrm{T}^{(1)}(\mathbf{x},\psi )\right) \overset{\mathcal {D}}{\rightarrow } \mathcal {N}\left( 0, \frac{\Phi _2(\mathbf{x},\psi )}{f(\mathbf{x})}\int _{\mathbb {R}^d} \left( K^{(1)}\right) ^2(\mathbf{y}) d\mathbf{y} \right) . \end{aligned}$$
(4.47)

Lemmas 4.9 and 4.10 combined with Theorem 2.4 complete the proof of Theorem 2.5.\(\square \)