Convergence rate for the moving least-squares learning with dependent sampling

Guo, Qin; Ye, Peixin

doi:10.1186/s13660-018-1794-8

Convergence rate for the moving least-squares learning with dependent sampling

Research
Open access
Published: 31 July 2018

Volume 2018, article number 200, (2018)
Cite this article

Download PDF

You have full access to this open access article

Journal of Inequalities and Applications Submit manuscript

Convergence rate for the moving least-squares learning with dependent sampling

Download PDF

Qin Guo¹ &
Peixin Ye¹

800 Accesses
3 Citations
Explore all metrics

Abstract

We consider the moving least-squares (MLS) method by the regression learning framework under the assumption that the sampling process satisfies the α-mixing condition. We conduct the rigorous error analysis by using the probability inequalities for the dependent samples in the error estimates. When the dependent samples satisfy an exponential α-mixing, we derive the satisfactory learning rate and error bound of the algorithm.

Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm

Article 03 February 2015

Large deviations for randomly weighted least squares estimator in a nonlinear regression model

Article 04 October 2023

Moment convergence of regularized least-squares estimator for linear regression model

Article 09 August 2016

1 Introduction

The least-squares (LS) method is an important global approximate method based on the regular or concentrated data sample points. However, there are still some irregular or scattered samples which are obtained in many practical applications such as engineering and machine learning [1–4]. They also need to be analyzed to achieve their special usefulness. The moving least-squares (MLS) method was introduced by McLain in [4] to draw a set of contours based on a cluster of scattered data sample points. It turns out that the MLS method is a useful local approximation tool in various fields of mathematics such as approximation theory, data smoothing [5], statistics [6], and numerical analysis [7]. Recently, research effort has been made to study the regression learning algorithm by the MLS method, see [8–12]. The main advantage of the MLS regression learning algorithm is that we can learn the regression function in the simple function space, usually generated by polynomials.

We recall the regression learning problem by the MLS method briefly. Functions for learning are defined on a compact metric space X (input space) and take values in $Y=\mathbb{R}$ (output space). The sampling process is controlled by an unknown Borel probability measure ρ on $Z= X\times Y$. We define the regression function as follows:

$$f_{\rho}(x)= \int_{Y} y\, d\rho(y|x), $$

where $\rho(\cdot|x)$ is the conditional probability measure induced by ρ on Y given $x\in X$. The goal of regression learning is to find a good approximation of the regression function $f_{\rho}$ based on a set of random samples $\mathbf{z}=\{z_{i}\}_{i=1}^{m}=\{(x_{i}, y_{i})\}_{i=1}^{m} \in Z^{m}$ drawn according to the measure ρ.

We define the approximation $f_{\mathbf{z}}$ of $f_{\rho}$ pointwisely:

$$ f_{\mathbf{z}}(x):=f_{\mathbf{z},\sigma,x}(x)=\arg\min _{f\in \mathcal{H}}\mathcal{E}_{\mathbf{z},x}(f),\quad x\in X, $$

(1.1)

the local moving empirical error is defined by

$$ \mathcal{E}_{\mathbf{z},x}(f)=\frac{1}{m}\sum _{i=1}^{m}\Phi \biggl(\frac{x}{\sigma}, \frac{x_{i}}{\sigma} \biggr) \bigl(f(x_{i})-y_{i} \bigr)^{2}, $$

(1.2)

where the hypothesis space $\mathcal{H}\subseteq C(X)$ is a d̃-dimensional Lipschitz function space, $\sigma=\sigma(m)>0$ is a window width, and $\Phi:\mathbb{R}^{n}\times\mathbb{R}^{n}\to \mathbb{R}^{+}$ is called an MLS weight function which satisfies the conditions as follows, see [9, 10]:

$$\begin{aligned} (1)&\quad \int_{\mathbb{R}^{n}}\Phi (x,t)\,dt=1,\quad \forall x,t \in \mathbb{R}^{n}, \end{aligned}$$

(1.3)

$$\begin{aligned} (2)&\quad \Phi(x,t)\geq c_{q},\quad \forall \vert x-t \vert \leq1, \end{aligned}$$

(1.4)

$$\begin{aligned} (3)&\quad \bigl\vert \Phi(x,t) \bigr\vert \leq\frac{\tilde{c_{q}}}{(1+ \vert x-t \vert )^{q}},\quad \forall x,t\in\mathbb{R}^{n}, \end{aligned}$$

(1.5)

where the constants $q>n+1$, $c_{q}, \tilde{c_{q}}>0$.

The task of the paper is to derive the error bound of $\|f_{\mathbf {z}}-f_{\rho}\|_{\rho_{X}}$ with the norm $\|f(\cdot)\|_{\rho_{X}}:=(\int_{X}|f(\cdot)|^{2}\, d{\rho _{X}})^{\frac{1}{2}}$ to evaluate the approximation ability of $f_{\mathbf{z}}$, see [13–22]. The error analysis of algorithm (1.1) for the independent and identical (i.i.d.) samples has been carried out in [8–10]. However, the samples are not independent but are not far from being independent in some real data analysis such as market prediction, system diagnosis, and speech recognition. The mixing conditions can quantify how close to independence a sequence of random samples is. In [14, 16, 23–25], the authors carried out the regression estimation of the least squares algorithm with the α-mixing samples. Up to now there has been no result of algorithm (1.1) obtained in the case of dependent samples. Hence we extend the analysis of algorithm (1.1) to the α-mixing sampling setting which is quite easy to establish, see [26].

Definition 1.1

Let $\mathcal {M}_{a}^{b} $ denote the σ-algebras of events generated by the random samples $\{z_{i}=(x_{i}, y_{i})\}_{i=a}^{b}$. $\{z_{i}\}_{i\geq1}$ is said to satisfy a strongly mixing condition (or α-mixing condition) if

$$ \alpha_{i}= \sup_{k\ge1} \sup_{A\in\mathcal {M}_{1}^{k},B\in\mathcal{M}_{k+i}^{\infty}} \bigl\vert P(A \cap B)-P(A)P(B) \bigr\vert \longrightarrow0,\quad \mbox{as } i\to\infty. $$

(1.6)

Specifically, if there exist some positive constants $\overline{\alpha }>0$, $\beta>0$, and $c>0$ such that

$$ \alpha_{i}\leq\overline{\alpha }\exp\bigl(-ci^{\beta} \bigr),\quad \forall i\geq1, $$

(1.7)

then it is said to satisfy an exponential strongly mixing condition.

Our goal is to obtain the convergence rate as $m \to\infty$ of algorithm (1.1) under hypothesis (1.7). The rest of the paper is organized as follows. In Sect. 2, we review some concepts and state our main results and the error decomposition. In Sect. 3, we present the estimate of the sample error. In Sect. 4, we provide the proofs of the main results.

2 Main results and error decomposition

Before giving the main results, we firstly need to provide some concepts that will be referred to throughout this paper, see [8–10].

Definition 2.1

The probability measure $\rho_{X} $ on X is said to satisfy the condition $L_{\tau}$ with exponent $\tau>0$ if

$$ \rho_{X} \bigl(B(x,r)\bigr)\geq c_{\tau}r^{\tau}, \quad \forall 0< r\leq r_{0}, x\in X, $$

(2.1)

where the constants $r_{0}>0$, $c_{\tau}>0$, and $B(x,r)=\{u\in X: |u-x|\leq r, \mbox{for }r>0\}$.

Definition 2.2

We say that the hypothesis space $\mathcal{H}$ satisfies the norming condition with exponent $\zeta>0$ and $d\in\mathbb{N}$ if we can find points $\{u_{i}\}_{i=1}^{d}\subset B(x,\sigma)$ for every $x\in X$ and $0<\sigma\leq\sigma_{0}$ satisfying $|u_{i}-u_{j}|\geq 2c_{\mathcal{H}}\sigma$ for $i\neq j$ and

$$ \Biggl(\sum_{i=1}^{d} \bigl\vert f(u_{i}) \bigr\vert ^{2} \Biggr)^{\frac{1}{2}}\geq c_{\mathcal{H}}\sigma^{\zeta}\|f\|_{C(X)},\quad \forall f\in \mathcal{H}, $$

(2.2)

where the constants $\sigma_{0}>0$, $c_{\mathcal{H}}>0$ and d is chosen as at least the dimension d̃ of $\mathcal{H}$.

Here we assume $|y|\leq M$ almost surely, and all the constants such as C̃, $C_{\mathcal{H},\zeta}$, $A_{\tau,\zeta}$, $C_{\mathcal{H},\rho _{X}}$, $C'_{\mathcal{H},\rho_{X}}$, and so on are independent of the key parameters δ, m, or σ in this paper. Now we give our main results of algorithm (1.1).

Theorem 2.1

Assume that (1.7), (2.1), and (2.2) hold. Suppose $0< p<2$, $\sigma= (m^{(\alpha)} )^{-\gamma}$ with $m^{(\alpha)}= \lfloor m \lceil \{\frac{8m}{c} \} ^{1/(1+\beta)} \rceil^{-1} \rfloor$, $\gamma>0$, and $0<\sigma\leq\min \{\sigma_{0},1,(r_{0}/C_{\mathcal{H},\zeta })^{1/\max\{\zeta,1\}} \}$. If m satisfies

$$\begin{aligned} \bigl(m^{(\alpha)} \bigr)^{1-\gamma\tau\max\{\zeta,1\}} \geq& \bigl(256c_{p}^{-\frac{1}{p}}/3+A_{\tau,\zeta} \bigr) \bigl(\log \bigl(2+8e^{-2}\overline{\alpha}\bigr)/\delta \bigr)^{1+\frac{1}{p}} \\ &{}+A_{\tau,\zeta}\gamma\log m^{(\alpha)}, \end{aligned}$$

(2.3)

then for any $0<\delta<1$, with confidence $1-\delta$, we have

$$ \|f_{\mathbf{z}}-f_{\rho}\|_{\rho_{X}}^{2} \leq\widetilde{C} \bigl(m^{(\alpha)} \bigr)^{4\gamma(\zeta+\frac{\tau}{2}\max\{\zeta,1\} )-n\gamma-\frac{1}{p+1}}. $$

(2.4)

Then we can obtain the explicit learning rate of algorithm (1.1) with selecting the suitable parameter $\sigma=\sigma(m)$.

Theorem 2.2

Under the assumptions of Theorem 2.1, if we choose $\sigma= (m^{(\alpha)} )^{\frac{\varepsilon}{-(4\varsigma+2\max\{\tau,\tau\varsigma\} )}}$, $0<\varepsilon<1/4$, and

$$ m^{(\alpha)}\geq C_{1} \bigl(\bigl(\log \bigl(2+8e^{-2}\overline{\alpha }\bigr)/\delta\bigr)^{1+\frac{1}{p}}+\log m^{(\alpha)} \bigr)^{2}+\sigma _{0}^{-(4\varsigma+2\max\{\tau,\tau\varsigma\})/\varepsilon}, $$

(2.5)

then with confidence $1-\delta$, we have

$$ \|f_{\mathbf{z}}-f_{\rho}\|_{\rho_{X}}\leq C_{2} \bigl(m^{(\alpha )} \bigr)^{\varepsilon-\frac{1}{2}}, $$

(2.6)

where

$$ C_{1}= \biggl(A_{\tau,\zeta}+\frac{256}{3}c_{p}^{-\frac{1}{p}} \biggr)^{2} \biggl(1+\frac{1}{4\varsigma+2\max\{\tau,\tau\zeta\}} \biggr)^{2}. $$

(2.7)

Remark 2.1

The result of the above theorem shows that the learning rate tends to $m^{-\frac{1}{2}}$ when $\sigma\rightarrow1$. For the i.i.d. case, the same rate has been obtained in [9, 10].

To estimate the quantity of the total error $\|f_{\mathbf{z}}-f_{\rho }\|_{\rho_{X}}$, we use the proposition from [8] below.

Proposition 2.1

Assume (2.1) and (2.2) hold. Then we have

$$ \|f_{\mathbf{z}}-f_{\mathcal{H}}\|_{\rho_{X}}^{2} \leq\widetilde {C_{\mathcal{H}}}\sigma^{-2\zeta-\tau\max\{\zeta,1\}} \int_{X} \bigl(\mathcal{E}_{x}(f_{\mathbf{z},\sigma,x})- \mathcal {E}_{x}(f_{\mathcal{H},\sigma,x}) \bigr)\, d\rho_{X}(x), $$

(2.8)

where

$$ \mathcal{E}_{x}(f)= \int_{Z}\Phi \biggl(\frac{x}{\sigma},\frac {u}{\sigma} \biggr) \bigl(f(u)-y\bigr)^{2}\, d\rho(u,y),\quad \forall f: X \rightarrow \mathbb{R} $$

(2.9)

is called the local moving expected risk and

$$ f_{\mathcal{H}}(x):=f_{\mathcal{H},\sigma,x}=\arg\min _{f\in \mathcal{H}}\mathcal{E}_{x}(f),\quad x\in X, $$

(2.10)

is called the target function.

Remark 2.2

Here we assume $f_{\rho}\in\mathcal{H}$. It follows from

$$ \mathcal{E}_{x}(f)= \int_{X}\Phi \biggl(\frac{x}{\sigma},\frac {u}{\sigma} \biggr) \bigl(f(u)-f_{\rho}(u)\bigr)^{2}\, d \rho_{X}(u)+\mathcal {E}_{x}(f_{\rho}),\quad \forall f: X\rightarrow\mathbb{R}, $$

(2.11)

that $f_{\mathcal{H}}=f_{\rho}$. Thus $\|f_{\mathbf{z}}-f_{\rho}\| _{\rho_{X}}=\|f_{\mathbf{z}}-f_{\mathcal{H}}\|_{\rho_{X}}$.

Next we only need to provide the upper bound of the integral in (2.8). So to do this, we give its decomposition as follows:

$$\begin{aligned} \int_{X} \bigl(\mathcal{E}_{x}(f_{\mathbf{z},\sigma,x})- \mathcal {E}_{x}(f_{\mathcal{H},\sigma,x}) \bigr)\,d\rho_{X}(x) \leq& \int _{X} \bigl[ \bigl(\mathcal{E}_{x}(f_{\mathbf{z},\sigma,x})- \mathcal {E}_{x}(f_{\mathcal{H},\sigma,x}) \bigr) \\ &{} - \bigl(\mathcal{E}_{\mathbf{z},x}(f_{\mathbf{z},\sigma ,x})-\mathcal{E}_{\mathbf{z},x}(f_{\mathcal{H},\sigma,x}) \bigr) \bigr]\,d\rho_{X}(x) \\ :=&\mathcal{S}(\mathbf{z},\sigma). \end{aligned}$$

(2.12)

What is left is to estimate the sample error $\mathcal{S}(\mathbf {z},\sigma)$.

3 Estimates for the sample error

In order to obtain the probability estimate of $\mathcal{S}(\mathbf {z},\sigma)$, we shall use the upper bound for $f_{\mathbf{z},\sigma ,x}$ and $f_{\mathcal{H},\sigma,x}$. We firstly derive the confidence-based estimate of $f_{\mathbf {z},\sigma,x}$ as follows.

Proposition 3.1

Under the assumptions of Theorem 2.1, if

$$ m^{(\alpha)}\geq-A_{\tau,\zeta}\log(\delta\sigma) \sigma^{-\tau \max\{\zeta,1\}}, $$

(3.1)

then with confidence at least $1-\delta$, we have

$$ \|f_{\mathbf{z},\sigma,x}\|_{C(X)}\leq\frac{2^{3+\tau+\zeta }M}{\sqrt{c_{\tau}c_{q}}C_{\mathcal{H},\zeta}^{\tau/2}c_{\mathcal {H}}} \sigma^{-\zeta-\max\{\frac{\tau}{2},\frac{\tau\zeta}{2}\} }:=C_{\mathcal{H},\rho_{X}}\sigma^{-\zeta-\max\{\frac{\tau }{2},\frac{\tau\zeta}{2}\}},\quad \forall x\in X, $$

(3.2)

where

$$\begin{aligned}& C_{\mathcal{H},\zeta}=\min \biggl\{ \frac{c_{\mathcal{H}}}{2^{\zeta +1}\sqrt{d}C_{\mathcal{H},0}}, \frac{c_{\mathcal{H}}}{2}, \frac {1}{2} \biggr\} , \\& A_{\tau,\zeta}=2^{\tau+1}\bigl(c_{\tau}C_{\mathcal{H},\zeta}^{\tau }-2^{\tau+1} \bigr)^{-1} \biggl[\frac{7}{6}+\frac{7}{6}\log \bigl(1+4e^{-2}\overline{\alpha}\bigr)+\frac{7n}{6}\log \biggl(1+ \frac {4B_{X}}{C_{\mathcal{H},\zeta}} \biggr) \biggr]. \end{aligned}$$

The proof is analogous to that of Theorem 3 in [8] except that we need to use the following Lemma 3.1 for the dependent sampling setting to replace Lemma 2 in [8].

Lemma 3.1

Let $0< r\leq r_{0}$ and $0<\delta<1$. If (1.7) and (2.1) hold, then with confidence $1-\delta$, we have

$$\begin{aligned} \frac{\sharp({\mathbf{x}}\cap B(x,r))}{m} \geq&c_{\tau} \biggl(\frac{r}{2} \biggr)^{\tau}+\frac{7\log\delta-7\log (1+4e^{-2}\overline{\alpha})-7n\log(\frac {4B_{X}}{r}+1)}{6m^{(\alpha)}} \\ &{}-1,\quad \forall x\in X. \end{aligned}$$

(3.3)

Specifically, if

$$ m^{(\alpha)}>\frac{ [7\log(\frac{1}{\delta})+7\log (1+4e^{-2}\overline{\alpha}) +7n\log(\frac{4B_{X}}{r}+1) ]}{6(\frac{c_{\tau}r^{\tau }}{2^{\tau+1}}-1)}, $$

(3.4)

then with confidence at least $1-\delta$, we have

$$ \frac{\sharp({\mathbf{x}}\cap B(x,r))}{m}\geq\frac{c_{\tau }}{2^{\tau+1}}r^{\tau},\quad \forall x\in X, $$

(3.5)

where $\frac{\sharp({\mathbf{x}}\cap B(x,r))}{m}$ is the proportion of those sampling points lying in $B(x,r)$.

Proof

It is shown in Theorem 5.3 of [27] that one can find $\{v_{j}\}_{j=1}^{\mathcal{N}}\subseteq X$ satisfying $X\subseteq B_{R}(\mathbb{R}^{n})\subseteq\bigcup_{j=1}^{\mathcal {N}}B(v_{j},\frac{r}{2})$ and $\mathcal{N}\leq(\frac{4R}{r}+1)^{n}$. Let $\xi^{(j)}: X\rightarrow\mathbb{R}$ be the characteristic function of the set $B(v_{j},\frac{r}{2})$. Its mean $\mu^{(j)}=\int _{X} \xi^{(j)}(x)\,d\rho_{X}=\rho_{X}(B(v_{j},\frac{r}{2}))$ satisfies $|\xi^{(j)}-\mu^{(j)}|\leq1$ and $\sigma^{2}(\xi ^{(j)})\leq1$. Now we use the Bernstein inequality for the dependent samples in [28].

Proposition 3.2

Suppose that (1.7) holds. Let the random variable $m^{(\alpha)}$ be the effective number of observations and $\xi _{i}=\xi(z_{i})$ be a real-valued function on the probability space Z with mean $\mu=\int_{Z} \xi(z)\,d\rho$ and variance $\sigma ^{2}$. Assume that $|\xi_{i}-\mu|\leq D$ almost surely. Then, for every $\varepsilon>0$,

$$ P \Biggl\{ \frac{1}{m}\sum_{i=1}^{m} [\xi_{i}-\mu ]>\varepsilon \Biggr\} \leq\bigl(1+4e^{-2} \overline{\alpha}\bigr)\exp \biggl\{ -\frac{m^{(\alpha)}\varepsilon^{2}}{2(\sigma^{2}+\frac {1}{3}D\varepsilon)} \biggr\} . $$

(3.6)

Then it follows from the above proposition that

$$ P \Biggl\{ \frac{1}{m}\sum_{i=1}^{m} \bigl[\xi^{(j)}_{i}-\mu^{(j)} \bigr]\leq-\varepsilon \Biggr\} \leq\bigl(1+4e^{-2}\overline{\alpha}\bigr)\exp \biggl\{ - \frac{m^{(\alpha)}\varepsilon^{2}}{2+\frac {2}{3}\varepsilon} \biggr\} ,\quad \forall\varepsilon>0, $$

(3.7)

hence,

$$ P \Biggl\{ \min_{1\leq j\leq\mathcal{N}} \Biggl\{ \frac{1}{m}\sum _{i=1}^{m} \bigl[\xi^{(j)}_{i}- \mu^{(j)} \bigr] \Biggr\} \leq -\varepsilon \Biggr\} \leq\mathcal{N} \bigl(1+4e^{-2}\overline{\alpha }\bigr)\exp \biggl\{ -\frac{m^{(\alpha)}\varepsilon^{2}}{2+\frac {2}{3}\varepsilon} \biggr\} . $$

(3.8)

For $0<\delta<1$, let

$$ \mathcal{N}\bigl(1+4e^{-2}\overline{\alpha}\bigr)\exp \biggl\{ - \frac {m^{(\alpha)}\varepsilon^{2}}{2+\frac{2}{3}\varepsilon} \biggr\} =\delta. $$

(3.9)

Then we get

$$\begin{aligned} \varepsilon&= \frac{\frac{2}{3}\log\frac{\mathcal{N}(1+4e^{-2}\overline{\alpha })}{\delta}+\sqrt{ (\frac{2}{3}\log\frac{\mathcal {N}(1+4e^{-2}\overline{\alpha})}{\delta} )^{2}+8m^{(\alpha )}\log\frac{\mathcal{N}(1+4e^{-2}\overline{\alpha})}{\delta }}}{2m^{(\alpha)}} \\ &\leq\frac{2\log\frac{\mathcal{N}(1+4e^{-2}\overline{\alpha })}{\delta}}{3m^{(\alpha)}}+\sqrt{\frac{2\log\frac{\mathcal {N}(1+4e^{-2}\overline{\alpha})}{\delta}}{m^{(\alpha)}}} \\ &\leq\frac{7\log\frac{\mathcal{N}(1+4e^{-2}\overline{\alpha })}{\delta}}{6m^{(\alpha)}}+1. \end{aligned}$$

(3.10)

It follows that, with confidence at least $1-\delta$,

$$ \min_{1\leq j\leq\mathcal{N}} \Biggl\{ \frac{1}{m}\sum _{i=1}^{m} \bigl[\xi^{(j)}_{i}- \mu^{(j)} \bigr] \Biggr\} >-\frac{7\log\frac{\mathcal {N}(1+4e^{-2}\overline{\alpha})}{\delta}}{6m^{(\alpha)}}-1. $$

(3.11)

Hence, we have

$$ \frac{1}{m}\sum_{i=1}^{m} \bigl[\xi^{(j)}-\mu^{(j)} \bigr]>-\frac {7\log\frac{\mathcal{N}(1+4e^{-2}\overline{\alpha})}{\delta }}{6m^{(\alpha)}}-1,\quad \forall j=1,\ldots,\mathcal{N}. $$

(3.12)

Condition (2.1) yields $\mu^{(j)}\geq c_{\tau} (\frac {r}{2} )^{\tau}$. Also $\xi^{(j)}(x_{i})=1$ if $x_{i}\in B(v_{j},\frac{r}{2})$ and 0 otherwise. So that $\frac{1}{m}\sum_{i=1}^{m}\xi^{(j)}(x_{i})=\sharp ({\mathbf{x}}\cap B(v_{j},\frac{r}{2}))/m$. Hence,

$$ \sharp\biggl({\mathbf{x}}\cap B\biggl(v_{j},\frac{r}{2}\biggr) \biggr)\big/m>c_{\tau} \biggl(\frac{r}{2} \biggr)^{\tau}- \frac{7\log\frac{\mathcal {N}(1+4e^{-2}\overline{\alpha})}{\delta}}{6m^{(\alpha)}}-1,\quad \forall j=1,\ldots,\mathcal{N}. $$

(3.13)

Observe from $X\subseteq\bigcup_{j=1}^{\mathcal{N}}B(v_{j},\frac {r}{2})$ that for each $x\in X$, there exists some $j\in {1,\ldots,\mathcal{N}}$ such that $x\in B(v_{j},\frac{r}{2})$, i.e., $|v_{j}-x|\leq\frac{r}{2}$. Since $x_{i}\in B(v_{j},\frac{r}{2})$ implies $|x_{i}-x|\leq|x_{i}-v_{j}|+|v_{j}-x|\leq r$, we see that

$$\begin{aligned} \sharp\bigl({\mathbf{x}}\cap B(x,r)\bigr)/m&\geq\sharp\biggl({\mathbf {x}}\cap B \biggl(v_{j},\frac{r}{2}\biggr)\biggr)\big/m \\ &\geq c_{\tau} \biggl(\frac{r}{2} \biggr)^{\tau}- \frac{7\log\frac {\mathcal{N}(1+4e^{-2}\overline{\alpha})}{\delta}}{6m^{(\alpha)}}-1. \end{aligned}$$

(3.14)

This proves Lemma 3.1. □

Now we are in a position to prove Proposition 3.1.

Proof of Proposition 3.1

By (3.1) and setting $r=C_{\mathcal{H},\zeta}\sigma^{\max\{ \zeta,1\}}\leq r_{0}$, it is easy to see that (3.4) holds. Then (3.5) is valid.

It follows from (3.5) and Definition 2.2 with σ replaced by $\frac{\sigma}{2}$ that

$$ m_{i}/m=\sharp\bigl({\mathbf{x}}\cap B(u_{i},r) \bigr)/m>c_{\tau}r^{\tau }/2^{\tau+1}, $$

(3.15)

and

$$ |x_{i,l}-u_{i}|\leq r, $$

(3.16)

where $\{x_{i,l}\}_{l=1}^{m_{i}}$ are the points of the set $\sharp ({\mathbf{x}}\cap B(u_{i},r))$, which implies

$$ |x_{i,l}-x|\leq|x_{i,l}-u_{i}|+|u_{i}-x| \leq r+\frac{\sigma}{2}\leq \sigma, $$

(3.17)

where $x\in X$, $l=1,\ldots,\tilde{m}$, and $\tilde {m}=\min_{1\leq i\leq d}\{m_{i}\}$.

Then, by (1.4), we have

$$ \biggl\vert \Phi\biggl(\frac{x}{\sigma},\frac{x_{i,l}}{\sigma}\biggr) \biggr\vert \geq c_{q}. $$

(3.18)

Hence

$$\begin{aligned} \frac{1}{m}\sum_{i=1}^{m}\Phi \biggl( \frac{x}{\sigma},\frac {x_{j}}{\sigma} \biggr) \bigl(f_{\mathbf{z},\sigma,x}(x_{j}) \bigr)^{2}&\geq\frac {1}{m}\sum_{i=1}^{d} \sum_{l=1}^{\widetilde{m}}\Phi \biggl( \frac {x}{\sigma},\frac{x_{i,l}}{\sigma} \biggr) \bigl(f_{\mathbf{z},\sigma ,x}(x_{i,l}) \bigr)^{2} \\ &\geq\frac{1}{m}\sum_{i=1}^{d}\sum _{l=1}^{\widetilde {m}}c_{q} \bigl(f_{\mathbf{z},\sigma,x}(x_{i,l})\bigr)^{2} \\ &\geq\frac{c_{\tau}}{2^{\tau+1}}r^{\tau}c_{q} \biggl(\frac {c_{\mathcal{H}}\sigma^{\zeta}}{2^{\zeta+1}} \biggr)^{2}\|f_{\mathbf {z},\sigma,x}\|_{C(X)}^{2}. \end{aligned}$$

(3.19)

The last inequality has been proved in Theorem 3 in [8].

Finally, combining (3.19) with the following inequality

$$ \frac{1}{m}\sum_{i=1}^{m} \Phi \biggl(\frac{x}{\sigma},\frac {x_{i}}{\sigma} \biggr) \bigl(f_{\mathbf{z},\sigma,x}(x_{i}) \bigr)^{2}\leq\frac {2}{m}\sum_{i=1}^{m} \Phi \biggl(\frac{x}{\sigma},\frac{x_{i}}{\sigma } \biggr) \bigl\{ (0-y_{i})^{2}+y_{i}^{2} \bigr\} \leq4M^{2}, $$

(3.20)

we derive the desired result. □

We also need to invoke Lemma 4 in [8] which provides the result about the upper bound of $f_{\mathcal{H},\sigma,x}$.

Proposition 3.3

Assume that (2.1) and (2.2) hold. Then, for some constant $C'_{\mathcal{H},\rho _{X}}$ independent of σ, we have

$$ \|f_{\mathcal{H},\sigma,x}\|_{C(X)}\leq C'_{\mathcal{H},\rho _{X}} \sigma^{-\zeta-\max\{\frac{\tau}{2},\frac{\tau\zeta}{2}\} },\quad \forall x\in X, 0< \sigma\leq\min\{ \sigma_{0},1\}. $$

(3.21)

Next we will bound the sample error. The estimation for $\mathcal {S}(\mathbf{z},\sigma)$ relies on the ratio probability inequality below that can be found in [27].

Proposition 3.4

Suppose that (1.7) holds. Let $\mathcal{G}$ be a set of functions on Z and $c>0$ such that, for each $g\in\mathcal {G}$, $\mu(g)=\int_{Z}g(z)\,d\rho\geq0$, $\mu(g^{2})\leq c\mu(g)$, and $|g(z)-\mu(g)|\leq D$ almost surely. Then, for every $\varepsilon >0$ and $0<\alpha\leq1$, we have

$$\begin{aligned} P \biggl\{ \sup_{g\in\mathcal{G}}\frac{\mu(g)-\frac{1}{m}\sum_{i=1}^{m}g(z_{i})}{\sqrt{\varepsilon+\mu(g)}}\geq4\alpha\sqrt { \varepsilon} \biggr\} &\leq\bigl(1+4e^{-2}\overline{\alpha} \bigr) \mathcal{N}(\mathcal{G},\alpha\varepsilon) \\ &\quad {}\times\exp \biggl\{ -\frac{\alpha^{2} m^{(\alpha)}\varepsilon }{2c+\frac{2}{3}D} \biggr\} . \end{aligned}$$

(3.22)

We obtain the upper bound estimate for $\mathcal{S}(\mathbf{z},\sigma )$ by using Proposition 3.4.

Proposition 3.5

If the assumptions of Proposition 3.1 hold,

$$ R=\max\bigl\{ C_{\mathcal{H},\rho_{X}}, C'_{\mathcal{H},\rho _{X}}, M\bigr\} \sigma^{-\zeta-\frac{\tau}{2}\max\{1,\zeta\})}, $$

(3.23)

and

$$ m^{(\alpha)}\geq\frac{256}{3}c_{p}^{-\frac{1}{p}} \biggl(\log \frac {2+8e^{-2}\overline{\alpha}}{\delta} \biggr)^{1+\frac{1}{p}}, $$

(3.24)

then with confidence $1-\delta$, there holds

$$\begin{aligned} & \int_{X} \bigl[ \bigl(\mathcal{E}_{x}(f_{\mathbf{z},\sigma ,x})- \mathcal{E}_{x}(f_{\mathcal{H},\sigma,x}) \bigr)- \bigl(\mathcal {E}_{\mathbf{z},x}(f_{\mathbf{z},\sigma,x})-\mathcal{E}_{\mathbf {z},x}(f_{\mathcal{H},\sigma,x}) \bigr) \bigr]\,d\rho_{X}(x) \\ &\quad \leq16 R^{2}D\sigma^{n} \biggl(\frac{256c_{p}}{3m^{(\alpha)}} \biggr)^{1/(1+p)}+\frac{1}{2} \int_{X} \bigl(\mathcal{E}_{x}(f_{\mathbf {z},\sigma,x})- \mathcal{E}_{x}(f_{\mathcal{H},\sigma,x}) \bigr)\,d\rho_{X}(x). \end{aligned}$$

(3.25)

Proof

Let the function $g(u,y)$ be defined on the function set

$$\begin{aligned} \mathcal{G}_{R}&= \biggl\{ \int_{X}\Phi \biggl(\frac{x}{\sigma},\frac {u}{\sigma} \biggr)\bigl[\bigl(f(u)-y\bigr)^{2}-\bigl(f_{\mathcal{H},\sigma ,x}(u)-y \bigr)^{2}\bigr]\,d\rho_{X}(x): \\ &\quad {}f\in B_{R}:=\{f\in\mathcal{H}:\|f\|_{C(X)}\leq R\} \biggr\} . \end{aligned}$$

(3.26)

With condition (1.5) and the bound $c_{\rho}$ of the density function of $\rho_{X}$, we have

$$ \int_{X}\Phi \biggl(\frac{x}{\sigma},\frac{u}{\sigma} \biggr)\,d\rho _{X}(x)\leq c_{\rho}\widetilde{c_{q}} \int_{\mathbb{R}^{n}}\frac {\sigma^{n}}{(1+|u|)^{ q}}\,du\leq\frac{2\pi^{n/2}c_{\rho}\tilde {c_{q}}}{(q-n)\Gamma(\frac{n}{2})} \sigma^{n}:=D\sigma^{n}, $$

(3.27)

which implies

$$ \bigl\vert g(u,y) \bigr\vert \leq2(R+M)^{2}D\sigma^{n} \leq8R^{2}D\sigma^{n}:=c_{R}. $$

(3.28)

Hence $|g(u,y)-\mu(g)|\leq2c_{R}$.

It follows from the Schwarz inequality that

$$\begin{aligned} \bigl\vert g(u,y) \bigr\vert ^{2}&= \biggl\vert \int_{X}\Phi \biggl(\frac{x}{\sigma},\frac {u}{\sigma} \biggr) \bigl(f(u)-f_{\mathcal{H},\sigma,x}(u) \bigr) \bigl(f(u)+f_{\mathcal{H},\sigma,x}(u)-2y \bigr)\,d\rho_{X}(x) \biggr\vert ^{2} \\ &\leq \int_{X}\Phi \biggl(\frac{x}{\sigma},\frac{u}{\sigma} \biggr) \bigl(f(u)-f_{\mathcal{H},\sigma,x}(u) \bigr)^{2}(2R+2M)^{2} \,d\rho _{X}(x) \\ &\quad {} \times \int_{X}\Phi \biggl(\frac{x}{\sigma},\frac{u}{\sigma} \biggr)\,d\rho_{X}(x). \end{aligned}$$

(3.29)

By (3.27),

$$ \mu\bigl(g^{2}\bigr)\leq16R^{2}D \sigma^{n} \int_{X} \biggl( \int_{X}\Phi \biggl(\frac{x}{\sigma},\frac{u}{\sigma} \biggr) \bigl(f(u)-f_{\mathcal {H},\sigma,x}(u) \bigr)^{2}\,d \rho_{X}(u) \biggr)\,d\rho_{X}(x). $$

(3.30)

It has been proved in [9] that

$$\begin{aligned}& \int_{X}\Phi \biggl(\frac{x}{\sigma},\frac{u}{\sigma} \biggr) \bigl(f(u)-f_{\mathcal{H},\sigma,x}(u) \bigr)^{2}\,d \rho_{X}(u) \\& \quad = \int _{Z}\Phi \biggl(\frac{x}{\sigma},\frac{u}{\sigma} \biggr)\bigl[\bigl(f(u)-y\bigr)^{2} -\bigl(f_{\mathcal{H},\sigma,x}(u)-y\bigr)^{2}\bigr]\,d \rho(u,y). \end{aligned}$$

(3.31)

Substituting (3.31) into (3.30),

$$\begin{aligned} \mu\bigl(g^{2}\bigr)&\leq16R^{2}D\sigma^{n} \int_{X} \biggl( \int_{Z}\Phi \biggl(\frac{x}{\sigma},\frac{u}{\sigma} \biggr)\bigl[\bigl(f(u)-y\bigr)^{2} \\ &\quad {} -\bigl(f_{\mathcal{H},\sigma,x}(u)-y\bigr)^{2}\bigr]\,d\rho(u,y) \biggr)\,d\rho _{X}(x) \\ &=16R^{2}D\sigma^{n} \int_{Z} \biggl( \int_{X}\Phi \biggl(\frac {x}{\sigma},\frac{u}{\sigma} \biggr)\bigl[\bigl(f(u)-y\bigr)^{2} \\ &\quad {} -\bigl(f_{\mathcal{H},\sigma,x}(u)-y\bigr)^{2}\bigr]\,d \rho_{X}(x) \biggr)\,d\rho (u,y) \\ &=16R^{2}D\sigma^{n}\mu(g). \end{aligned}$$

(3.32)

Using Proposition 3.4 with $\alpha=\frac{1}{4}$ and $\mathcal{G}=\mathcal{G}_{R}$, we know that

$$\begin{aligned} &P \biggl\{ \sup_{f\in B_{R}}\frac{\int_{X} [ (\mathcal {E}_{x}(f)-\mathcal{E}_{x}(f_{\mathcal{H},\sigma,x}) )- (\mathcal{E}_{\mathbf{z},x}(f)-\mathcal{E}_{\mathbf {z},x}(f_{\mathcal{H},\sigma,x}) ) ]\,d\rho_{X}(x)}{\sqrt {\varepsilon+\int_{X} (\mathcal{E}_{x}(f)-\mathcal {E}_{x}(f_{\mathcal{H},\sigma,x}) )\,d\rho_{X}(x)}}\geq\sqrt { \varepsilon} \biggr\} \\ &\quad \leq\bigl(1+4e^{-2}\overline{\alpha} \bigr)\mathcal{N}\biggl( \mathcal{G}_{R},\frac{\varepsilon}{4}\biggr)\exp \biggl\{ - \frac{3m^{(\alpha)}\varepsilon}{2048R^{2}D\sigma^{n}} \biggr\} . \end{aligned}$$

(3.33)

Since for any $g_{1}, g_{1}\in\mathcal{G}_{R}$,

$$\begin{aligned} \bigl\vert g_{1}(u,y)-g_{2}(u,y) \bigr\vert &= \biggl\vert \int_{X}\Phi \biggl(\frac{x}{\sigma },\frac{u}{\sigma} \biggr) \bigl(\bigl(f_{1}(u)-y\bigr)^{2}- \bigl(f_{2}(u)-y\bigr)^{2} \bigr)\,d\rho_{X}(x) \biggr\vert \\ &\leq \biggl\vert \int_{X}\Phi \biggl(\frac{x}{\sigma},\frac{u}{\sigma } \biggr) \bigl(f_{1}(u)-f_{2}(u)\bigr) \\ &\quad {} \times\bigl(f_{1}(u)+f_{2}(u)-2y\bigr)\,d \rho_{X}(x) \biggr\vert \\ &\leq4 R D\sigma^{n} \bigl\vert f_{1}(u)-f_{2}(u) \bigr\vert , \end{aligned}$$

(3.34)

then we have

$$\begin{aligned} \mathcal{N} \biggl(\mathcal{G}_{R},\frac{\varepsilon}{4} \biggr)&\leq \mathcal{N} \biggl(B_{R},\frac{\varepsilon}{16 R D\sigma^{n}} \biggr) \\ &=\mathcal{N} \biggl(B_{1},\frac{\varepsilon}{16R^{2}D\sigma ^{n}} \biggr). \end{aligned}$$

(3.35)

It follows from (3.33) that

$$\begin{aligned} &P \biggl\{ \sup_{f\in B_{R}}\frac{\int_{X} [ (\mathcal {E}_{x}(f)-\mathcal{E}_{x}(f_{\mathcal{H},\sigma,x}) )- (\mathcal{E}_{\mathbf{z},x}(f)-\mathcal{E}_{\mathbf {z},x}(f_{\mathcal{H},\sigma,x}) ) ]\,d\rho_{X}(x)}{\sqrt {\varepsilon+\int_{X} (\mathcal{E}_{x}(f)-\mathcal {E}_{x}(f_{\mathcal{H},\sigma,x})\,d\rho_{X}(x)}}\leq\sqrt { \varepsilon} \biggr\} \\ &\quad \geq1-\bigl(1+4e^{-2}\overline{\alpha} \bigr)\mathcal{N} \biggl(B_{1},\frac{\varepsilon}{16R^{2}D\sigma ^{n}} \biggr)\exp \biggl\{ -\frac{3m^{(\alpha)}\varepsilon }{2048R^{2}D\sigma^{n}} \biggr\} . \end{aligned}$$

(3.36)

We set the term $(1+4e^{-2}\overline{\alpha} )\mathcal{N} (B_{1},\frac{\varepsilon}{16R^{2}D\sigma^{n}} )\exp \{-\frac{3m^{(\alpha)}\varepsilon}{2048R^{2}D\sigma ^{n}} \}$ of the above inequality to $\delta/2$. We need to invoke the lemma proved by the same method of Proposition 4.3 in [21].

Lemma 3.2

Let $\eta^{\ast}(m^{(\alpha)},\delta)$ be the smallest positive solution of the following inequality in η:

$$\bigl(1+4e^{-2}\overline{\alpha} \bigr)\mathcal{N}(B_{1}, \eta))\exp\biggl\{ -\frac{3m^{(\alpha)}\eta}{128}\biggr\} \leq\delta. $$

If $\log\mathcal{N}(B_{1},\eta)\leq c_{p}(\eta)^{-p}$, for some $p\in(0,2)$, $c_{p}>0$ and all $\eta>0$, then with confidence at least $1-\delta$, we have

$$ \eta^{\ast}\bigl(m^{(\alpha)},\delta\bigr)\leq\max \biggl\{ \frac {256}{3m^{(\alpha)}}\log\frac{1+4e^{-2}\overline{\alpha}}{\delta }, \biggl(\frac{256c_{p}}{3m^{(\alpha)}} \biggr)^{1/(1+p)} \biggr\} . $$

(3.37)

Then we return to the proof of Proposition 3.5.

It follows from Theorem 5.3 in [27] that

$$ \mathcal{N} \biggl(B_{1},\frac{\varepsilon}{16R^{2}D\sigma^{n}} \biggr) \textstyle\begin{cases} \leq (\frac{32 R^{2}D\sigma^{n}}{\varepsilon}+1 )^{\tilde {d}}, & \mbox{for } 0< \varepsilon< 16 R^{2}D\sigma^{n}; \\ =1, & \mbox{for } \varepsilon\geq16 R^{2}D\sigma^{n}. \end{cases} $$

(3.38)

When $0<\varepsilon<16 R^{2}D\sigma^{n}$,

$$\begin{aligned} \begin{aligned}[b] \log\mathcal{N} \biggl(B_{1},\frac{\varepsilon}{16R^{2}D\sigma ^{n}} \biggr)&\leq\tilde{d} \log \biggl(\frac{32 R^{2}D\sigma ^{n}}{\varepsilon}+1 \biggr) \\ &\leq\frac{\tilde{d}}{p} \biggl(\frac{32 R^{2}D\sigma ^{n}}{\varepsilon} \biggr)^{p} \\ &=\frac{\tilde{d}}{p}\bigl(32 R^{2}D\sigma^{n} \bigr)^{p}\varepsilon^{-p},\quad p>0. \end{aligned} \end{aligned}$$

(3.39)

When $\varepsilon\geq16 R^{2}D\sigma^{n}$, we have

$$ \log\mathcal{N} \biggl(\mathcal{G}_{R},\frac{\varepsilon}{4} \biggr) \leq0. $$

(3.40)

Hence we conclude that

$$ \log\mathcal{N} \biggl(B_{1},\frac{\varepsilon}{16R^{2}D\sigma ^{n}} \biggr) \leq\frac{2^{p}\tilde{d}}{p} \biggl(\frac{\varepsilon }{16 R^{2}D\sigma^{n}} \biggr)^{-p}:=c_{p} \eta^{-p}. $$

(3.41)

This, together with (3.37), implies that, for

$$ m^{(\alpha)}\geq\frac{256}{3}c_{p}^{-\frac {1}{p}} \biggl(\log\frac{2+8e^{-2}\overline{\alpha}}{\delta} \biggr)^{1+\frac{1}{p}}, $$

(3.42)

we obtain

$$ \varepsilon\leq16 R^{2}D\sigma^{n} \biggl( \frac {256c_{p}}{3m^{(\alpha)}} \biggr)^{1/(1+p)}. $$

(3.43)

Combining (3.43) with (3.36), with confidence $1-\delta /2$, we have

$$\begin{aligned} & \int_{X} \bigl[ \bigl(\mathcal{E}_{x}(f)- \mathcal{E}_{x}(f_{\mathcal {H},\sigma,x}) \bigr)- \bigl(\mathcal{E}_{\mathbf{z},x}(f)- \mathcal {E}_{\mathbf{z},x}(f_{\mathcal{H},\sigma,x}) \bigr) \bigr]\,d\rho _{X}(x) \\ &\quad \leq\sqrt{\varepsilon}\sqrt{\varepsilon+ \int_{X} \big(\mathcal {E}_{x}(f)-\mathcal{E}_{x}(f_{\mathcal{H},\sigma,x})\big) \,d\rho_{X}(x)} \\ &\quad \leq\varepsilon+\frac{1}{2} \int_{X} \big(\mathcal {E}_{x}(f)-\mathcal{E}_{x}(f_{\mathcal{H},\sigma,x})\big) \,d\rho _{X}(x) \\ &\quad \leq16 R^{2}D\sigma^{n} \biggl(\frac{256c_{p}}{3m^{(\alpha)}} \biggr)^{1/(1+p)}+\frac{1}{2} \int_{X} \bigl(\mathcal{E}_{x}(f)-\mathcal {E}_{x}(f_{\mathcal{H},\sigma,x}) \bigr)\,d\rho_{X}(x). \end{aligned}$$

(3.44)

Finally, setting $f=f_{\mathbf{z},\sigma,x}$ in the above inequality, we derive the desired result. □

4 Proofs of the main results

In this subsection, we provide the proofs of Theorem 2.1 and Theorem 2.2. We firstly prove Theorem 2.1.

Proof

If we take $\sigma= (m^{(\alpha)} )^{-\gamma}$, $\gamma>0$, then we have

$$ R=\max\bigl\{ C_{\mathcal{H},\rho_{X}}, C'_{\mathcal{H},\rho_{X}}, M\bigr\} m^{\gamma(\zeta+\frac{\tau}{2}\max\{1,\zeta\})}. $$

(4.1)

It is readily seen that (2.3) implies (3.1). Then Proposition 3.5 holds true. We thus obtain, with confidence $1-\delta$,

$$ \int_{X} \bigl(\mathcal{E}_{x}(f_{\mathbf{z},\sigma,x})- \mathcal {E}_{x}(f_{\mathcal{H},\sigma,x}) \bigr)\,d\rho_{X}(x) \leq32 R^{2}D\sigma^{n} \biggl(\frac{256c_{p}}{3m^{(\alpha)}} \biggr)^{1/(1+p)}. $$

(4.2)

Therefore from (2.8) we obtain

$$\begin{aligned} \|f_{\mathbf{z}}-f_{\mathcal{H}}\|_{\rho_{X}}^{2}&\leq \widetilde {C_{\mathcal{H}}}\sigma^{-2\zeta-\tau\max\{\zeta,1\}}\times32 R^{2}D \sigma^{n} \biggl(\frac{256c_{p}}{3m^{(\alpha)}} \biggr)^{1/(1+p)} \\ &\leq\widetilde{C}\bigl(m^{(\alpha)}\bigr)^{4\gamma(\zeta+\frac{\tau }{2}\max\{\zeta,1\})-n\gamma-\frac{1}{p+1}}. \end{aligned}$$

(4.3)

This proves Theorem 2.1. □

Next, we prove Theorem 2.2.

Proof

Let $\gamma=\varepsilon/[4\varsigma+2\max\{\tau,\tau\varsigma\} ]>0$ and $p=\frac{\varepsilon}{1-\varepsilon}$. Therefore we have

$$ \gamma\tau\max\{1,\varsigma\}< \varepsilon< 1/4 $$

(4.4)

and

$$ \bigl(m^{(\alpha)}\bigr)^{1-\gamma\tau\max\{1,\varsigma\}}\geq\bigl(m^{(\alpha )} \bigr)^{1-\varepsilon}\geq m^{\frac{1}{2}}. $$

(4.5)

It follows from (2.5) that

$$\begin{aligned} \bigl(m^{(\alpha)}\bigr)^{\frac{1}{2}}&\geq \biggl(A_{\tau,\zeta}+ \frac {256}{3}c_{p}^{-\frac{1}{p}} \biggr) \biggl(1+ \frac{1}{4\varsigma +2\max\{\tau,\tau\zeta\}} \biggr) \\ &\quad {} \times \Bigl(\bigl(\log\bigl(2+8e^{-2}\overline{\alpha}\bigr)/ \delta\bigr)^{1+\frac {1}{p}}+\log m^{(\alpha)}\Bigr) \\ &\geq \biggl(A_{\tau,\zeta}+\frac{256}{3}c_{p}^{-\frac{1}{p}} \biggr) \Bigl(\bigl(\log\bigl(2+8e^{-2}\overline{\alpha}\bigr)/\delta \bigr)^{1+\frac {1}{p}}+\gamma\log m^{(\alpha)} \Bigr), \end{aligned}$$

(4.6)

which implies that condition (2.3) of Theorem 2.1 holds true, we thus obtain, with confidence $1-\delta$,

$$ \|f_{\mathbf{z}}-f_{\mathcal{H}}\|_{\rho_{X}}\leq C_{1} \bigl(m^{(\alpha)} \bigr)^{\varepsilon-\frac{1}{2}}. $$

(4.7)

This proves Theorem 2.2. □

References

Cerny, M., Antoch, J., Hladik, M.: On the possibilistic approach to linear regression models involving uncertain, indeterminate or interval data. Inf. Sci. 244(7), 26–47 (2013)
Article MathSciNet MATH Google Scholar
Fasshauer, G.E.: Toward approximate moving least squares approximation with irregularly spaced centers. Comput. Methods Appl. Mech. Eng. 193(12–14), 1231–1243 (2004)
Article MathSciNet MATH Google Scholar
Komargodski, Z., Levin, D.: Hermite type moving-least-squares approximations. Comput. Math. Appl. 51(8), 1223–1232 (2006)
Article MathSciNet MATH Google Scholar
Mclain, D.H.: Drawing contours from arbitrary data points. Comput. J. 17(17), 318–324 (1974)
Article Google Scholar
Savitzky, A., Golay, M.J.E.: Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36(8), 1627–1639 (1964)
Article Google Scholar
Wand, M.P., Jones, M.C.: Kernel Smoothing. Chapman & Hall, London (1995)
Book MATH Google Scholar
Shepard, D.: A two-dimensional interpolation function for irregularly-spaced data. In: ACM National Conference, pp. 517–524 (1968)
Google Scholar
Wang, H.Y., Xiang, D.H., Zhou, D.X.: Moving least-square method in learning theory. J. Approx. Theory 162(3), 599–614 (2010)
Article MathSciNet MATH Google Scholar
Wang, H.Y.: Concentration estimates for the moving least-square method in learning theory. J. Approx. Theory 163(9), 1125–1133 (2011)
Article MathSciNet MATH Google Scholar
He, F.C., Chen, H., Li, L.Q.: Statistical analysis of the moving least-squares method with unbounded sampling. Inf. Sci. 268(1), 370–380 (2014)
Article MathSciNet MATH Google Scholar
Tong, H.Z., Wu, Q.: Learning performance of regularized moving least square regression. J. Comput. Appl. Math. 325, 42–55 (2017)
Article MathSciNet MATH Google Scholar
Guo, Q., Ye, P.X.: Error analysis of the moving least-squares method with non-identical sampling. Int. J. Comput. Math., 1–15 (2018). https://doi.org/10.1080/00207160.2018.1469748
Smale, S., Zhou, D.X.: Online learning with Markov sampling. Anal. Appl. 7(1), 87–113 (2009)
Article MathSciNet MATH Google Scholar
Pan, Z.W., Xiao, Q.W.: Least-square regularized regression with non-iid sampling. J. Stat. Plan. Inference 139(10), 3579–3587 (2009)
Article MathSciNet MATH Google Scholar
Guo, Z.C., Shi, L.: Classification with non-i.i.d. sampling. Math. Comput. Model. 54(5), 1347–1364 (2011)
Article MathSciNet MATH Google Scholar
Guo, Q., Ye, P.X.: Coefficient-based regularized regression with dependent and unbounded sampling. Int. J. Wavelets Multiresolut. Inf. Process. 14(5), 1–14 (2016)
Article MathSciNet MATH Google Scholar
Sun, H.W., Guo, Q.: Coefficient regularized regression with non-iid sampling. Int. J. Comput. Math. 88(15), 3113–3124 (2011)
Article MathSciNet MATH Google Scholar
Chu, X.R., Sun, H.W.: Half supervised coefficient regularization for regression learning with unbounded sampling. Int. J. Comput. Math. 90(7), 1321–1333 (2013)
Article MathSciNet MATH Google Scholar
Steinwart, I., Hush, D., Scovel, C.: Learning from dependent observations. J. Multivar. Anal. 100(1), 175–194 (2009)
Article MathSciNet MATH Google Scholar
Billingsley, P.: Convergence of probability measures. Appl. Stat. 159(1–2), 1–59 (1968)
MATH Google Scholar
Wu, Q., Ying, Y.M., Zhou, D.X.: Learning rates of least-square regularized regression. Found. Comput. Math. 6(2), 171–192 (2006)
Article MathSciNet MATH Google Scholar
Lv, S.G., Feng, Y.L.: Semi-supervised learning with the help of Parzen windows. J. Math. Anal. Appl. 386(1), 205–212 (2012)
Article MathSciNet MATH Google Scholar
Sun, H.W., Wu, Q.: Regularized least square regression with dependent samples. Adv. Comput. Math. 32(2), 175–189 (2010)
Article MathSciNet MATH Google Scholar
Chu, X.R., Sun, H.W.: Regularized least square regression with unbounded and dependent sampling. Abstr. Appl. Anal. 2013, Article ID 139318 (2013)
MathSciNet MATH Google Scholar
Guo, Q., Ye, P.X., Cai, B.L.: Convergence rate for $l^{q}$ -coefficient regularized regression with non-i.i.d. sampling. IEEE Access 6, 18804–18813 (2018)
Article Google Scholar
Parthasarathy, K.R.: Convergence of probability measures. Technometrics 12(1), 171–172 (1968)
MathSciNet Google Scholar
Cucker, F., Zhou, D.X.: Learning Theory: An Approximation Theory Viewpoint. Cambridge University Press, Cambridge (2007)
Book MATH Google Scholar
Modha, D.S., Masry, E.: Minimum complexity regression estimation with weakly dependent observations. IEEE Trans. Inf. Theory 42(6), 2133–2145 (1996)
Article MathSciNet MATH Google Scholar

Download references

Funding

This work was supported by the National Natural Science Foundation of China (grant no. 11671213) and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
Qin Guo & Peixin Ye

Authors

Qin Guo
View author publications
You can also search for this author in PubMed Google Scholar
Peixin Ye
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Both authors contributed equally to the writing of this paper. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Qin Guo.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Guo, Q., Ye, P. Convergence rate for the moving least-squares learning with dependent sampling. J Inequal Appl 2018, 200 (2018). https://doi.org/10.1186/s13660-018-1794-8

Download citation

Received: 09 March 2018
Accepted: 21 July 2018
Published: 31 July 2018
DOI: https://doi.org/10.1186/s13660-018-1794-8

Convergence rate for the moving least-squares learning with dependent sampling

Abstract

Similar content being viewed by others

Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm

Large deviations for randomly weighted least squares estimator in a nonlinear regression model

Moment convergence of regularized least-squares estimator for linear regression model

1 Introduction

Definition 1.1

2 Main results and error decomposition

Definition 2.1

Definition 2.2

Theorem 2.1

Theorem 2.2

Remark 2.1

Proposition 2.1

Remark 2.2

3 Estimates for the sample error

Proposition 3.1

Lemma 3.1

Proof

Proposition 3.2

Proof of Proposition 3.1

Proposition 3.3

Proposition 3.4

Proposition 3.5

Proof

Lemma 3.2

4 Proofs of the main results

Proof

Proof

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

MSC

Keywords

Search

Navigation