Distribution function estimation via Bernstein polynomial of random degree

Dutta, Santanu

doi:10.1007/s00184-015-0553-9

Distribution function estimation via Bernstein polynomial of random degree

Published: 26 June 2015

Volume 79, pages 239–263, (2016)
Cite this article

Metrika Aims and scope Submit manuscript

Santanu Dutta¹

305 Accesses
3 Citations
Explore all metrics

Abstract

The problem of distribution function (df) estimation arises naturally in many contexts. The empirical and the kernel df estimators are well known. There is another df estimator based on a Bernstein polynomial of degree m. For a Bernstein df estimator, plays the same role as the bandwidth in a kernel estimator. The asymptotic properties of the Bernstein estimator has been studied so far assuming m is non random, chosen subjectively. We propose algorithms for data driven choice of m. Such an m is a function of the data, i.e. random. We obtain the convergence rates of a Bernstein df estimator, using a random m, for i.i.d., strongly mixing and a broad class of linear processes. The estimator is shown to be consistent for any stationary, ergodic process satisfying some conditions. Using simulations and analysis of real data the finite sample performance of the different df estimators are compared.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Empirical Distribution Functions Under Auxiliary Information

On the Parameters Estimators for a Discrete Analog of the Generalized Exponential Distribution

Article 01 May 2018

Estimation of the PDF and the CDF of exponentiated moment exponential distribution

Article 18 March 2017

References

Altman N, Léger C (1995) Bandwidth selection for kernel distribution function estimation. J Stat Plan Inference 46:195–214
Article MathSciNet MATH Google Scholar
Azzalini A (1981) Estimation of a distribution function and quantiles by a kernel method. Biometrika 68(1):326–328
Article MathSciNet Google Scholar
Bowman A, Hall P, Prvan T (1998) Bandwidth selection for the smoothing of distribution functions. Biometrika 85:799–808
Article MathSciNet MATH Google Scholar
Berkes I, Hörmann S, Schauer J (2009) Asymptotic results for the empirical process of stationary sequences. Stoch Process Appl 119:1298–1324
Article MathSciNet MATH Google Scholar
Babu GJ, Canty AJ, Chaubey YP (2002) Application of Bernstein polynomials for smooth estimation of a distribution and density function. J Stat Plan Inference 105:377–392
Article MathSciNet MATH Google Scholar
Babu GJ, Chaubey YP (2006) Smooth estimation of a distribution and density function on a hypercube using Bernstein polynomials for dependent random vectors. Stat Probab Lett 76:959–969
Article MathSciNet MATH Google Scholar
Billingsley P (1999) Convergence of probability measures. Wiley, New York
Book MATH Google Scholar
Chacón JE, Rodríguez-Casal A (2010) A note on the universal consistency of the kernel distribution function estimator. Stat Probab Lett (in press). doi:10.1016/j.spl.2010.05.007
del Río AQ, Estévez-Pérez G (2012) Nonparametric kernel distribution function estimation with kerdiest: an R package for bandwidth choice and applications. J Stat Softw 50(8):1–21
Google Scholar
Dutta S (2013) Local smoothing for kernel distribution function estimation. Commun Stat Simul Comput 43(2):378–389. doi:10.1080/03610918.2012.703360
Article Google Scholar
Dutta S, Goswami A (2013) Pointwise and uniform convergence of kernel density estimators using random bandwidths. Stat Probab Lett 83:2711–2720
Article MathSciNet MATH Google Scholar
Falk FY (1983) Relative efficiency and deficiency of kernel type estimators of distribution functions. Stat Neerl 37(2):73–83
Article MathSciNet MATH Google Scholar
Feller W (1984) An introduction to probability theory and its applications, vol 2. Wiley, New York
MATH Google Scholar
Hesse CH (1990) Rates of convergence for the empirical distribution function and the empirical characteristic function of a broad class of linear processes. J Multivar Anal 35:186–202
Article MathSciNet MATH Google Scholar
Leblanc A (2009) Chung–Smirnov property for Bernstein estimators of distribution functions. J Nonparametr Stat 21:133–142
Article MathSciNet MATH Google Scholar
Leblanc A (2012a) On estimating distribution functions using Bernstein polynomials. Ann Inst Stat Math 64:919–943
Article MathSciNet MATH Google Scholar
Leblanc A (2012b) On the boundary properties of Bernstein polynomial estimators of density and distribution functions. J Stat Plan Inference 142:2762–2778
Article MathSciNet MATH Google Scholar
Larsen TB, Nielsen JP, Guillen M, Bolance C (2005) Kernel density estimation for heavy-tailed distributions using the Champernowne transformation. Statistics 39(6):503–518
Article MathSciNet MATH Google Scholar
Merlevède F, Peligrad M, Rio E (2009) Bernstein inequality and moderate deviations under strong mixing conditions. IMS collections, high dimensional probability V, Institute of Mathematical Statistics pp 273–292. doi:10.1214/09-IMSCOLL518
Pham DT, Tran LT (1985) Some mixing properties of time series models. Stoch Process Appl 19:297–303
Article MathSciNet MATH Google Scholar
Rao BLSP (1983) Nonparametric functional estimation. Academic Press Inc, New York
MATH Google Scholar
Reiss RD (1981) Nonparametric estimation of smooth distribution functions. Scand J Stat 8(2):116–119
MathSciNet MATH Google Scholar
Serfling R (1980) Approximation theorems of mathematical statistics. Wiley, New York
Book MATH Google Scholar
Wand MP, Marron JS, Ruppert D (1991) Transformations in density estimation. J Am Stat Assoc 86(414):343–353
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The author is grateful to the esteemed reviewers/referees and the Associate Editor for their detailed suggestions which lead to significant improvement of the paper. The author is thankful to Prof. Alok Goswami, ISI Kolkata, for his comments and suggestions on some of the proofs in this paper.

Author information

Authors and Affiliations

Mathematical Sciences Department, Tezpur University, Napaam, 784028, India
Santanu Dutta

Authors

Santanu Dutta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Santanu Dutta.

Appendix

Proof of Lemma 1

In this case $||F_n-F||=\sup _{[0,1]}|F_n(x)-F(x)|$. Let us partition the interval [0, 1] into $k_n=[n^{2/3}]+1$ subintervals $J_{ni}=[\frac{(i-1)}{n^{2/3}}, \frac{i}{n^{2/3}}),\ i=1,2,\ldots ,[n^{2/3}],$ and $J_{nk_n}=([n^{2/3}]/n^{2/3},\ 1]$ each of length $n^{-2/3}$ or less (the last interval). Then

$y_{i-1}$ is the lower endpoint of the interval $J_{ni},\ i=,1,\ldots ,k_n$. Under the assumption that F has a bounded density

$$\begin{aligned} \sup _{x\in J_{ni}}|F(x)-F(y_{i-1})|\le \frac{||f||}{n^{2/3}},\ i=1,\ldots ,k_n. \end{aligned}$$

The right-hand side of the above inequality is free of i and goes to zero as n is increased. Therefore there exists N (depending on $\epsilon >0$) such that for $n>N$

$$\begin{aligned} P\left( n^{1/3}\sup _{x\in J_{ni}}|F(x)-F(y_{i-1})|>\epsilon \right) =0,\quad \ x\in J_{ni}, i=1,\ldots ,k_n. \end{aligned}$$

Moreover

$$\begin{aligned}&F_n(y_{i-1})-F(y_{i-1})=\!\frac{1}{n}\sum ^n_{j=1} Z^i_{nj}\\&\text {and}\sup _{x\in J_{ni}}|F_n(x)-F_n(y_{i-1})|\le \frac{1}{n}\sum ^n_{j=1} I\left( X_j\in J_{n,i}\right) \le \frac{1}{n}\sum ^n_{j=1} Y^i_{nj} + \frac{||f||}{n^{2/3}}, \end{aligned}$$

where $Z^i_{nj}\!=\!I\left( X_j\le y_{i-1}\right) - P\left( X_j\!\le \! y_{i-1}\right) ,\ Y^i_{nj}=I\left( X_j\in J_{n,i}\right) - P\left( X_j\in J_{n,i}\right) , j=1,\ldots , n,$ and $i=1,\ldots , k_n$. Therefore for $n>N$,

$$\begin{aligned} P\left( n^{1/3}\sup _{[0,1]}|F_n(x)-F(x)|>\epsilon \right)\le & {} \sum ^{k_n}_{i=1}\left[ P\left( \left| \frac{1}{n}\sum ^n_{j=1} Z^i_{nj}\right| >\frac{\epsilon }{n^{1/3}}\right) \right. \\&\left. +\, P\left( \left| \frac{1}{n}\sum ^n_{j=1} Y^i_{nj}\right| >\frac{\epsilon }{n^{1/3}}\right) \right] \end{aligned}$$

If $\{X_n\}$ is a stationary strongly mixing process with mixing coefficient $\alpha (n)$, then for each i, $\{Z^i_{nj}, j = 1, . . ., n\},\ \{Y^i_{nj}, j = 1, . . ., n\}$ represent strongly mixing stationary sequences of mean zero bounded random variables with a sequence of mixing coefficients bounded by $\alpha (n)$.

Under the stated conditions $\alpha (n)\le \exp (-2cn)$ for some $c > 0$. Under this condition, using Theorem 1 of Merlevède et al. (2009) we get that for $\epsilon > 0$ and $n\ge 4$,

$$\begin{aligned}&P\left( \left| \frac{1}{n}\sum ^n_{j=1} Z^i_{nj}\right| >\frac{\epsilon }{n^{1/3}}\right) ,\ P\left( \left| \frac{1}{n}\sum ^n_{j=1} Y^i_{nj}\right| >\epsilon n^{-1/3}\right) \\&\quad \le \exp \left( -\frac{Cn^{2/3}\epsilon ^2}{n^{1/3}+\epsilon \log (n) \log \log (n)}\right) . \end{aligned}$$

Therefore as $n\rightarrow \infty $

$$\begin{aligned} P\left( n^{1/3}\sup _{[0,1]}|F_n(x)-F(x)|>\epsilon \right)= & {} O\left( k_n\exp \left( -n^{1/3}C/(\log (n)\log \log (n))\right) \right) \\= & {} O\left( n^{2/3}\exp \left( -n^{1/3}C/(\log (n)\log \log (n))\right) \right) . \end{aligned}$$

$\square $

Lemma 2

Let $\{X_n\}_{n=1,2,\ldots }$ be a strongly mixing process with differentiable marginal density f supported on [0, 1] and $\alpha (n)<D\rho ^n$, where $0<\rho <1$ and $D>0$. Let $f^{(1)}$ be continuous on [0, 1] and the kernel K satisfies Assumption A. Then for any $x_0$ as $n\rightarrow \infty $,

$$\begin{aligned} f^{(1)}_n(x_0)\rightarrow f^{(1)}(x_0)\ \text {a.s.}, \end{aligned}$$

where $f^{(1)}_n(x)=\frac{1}{nh^2}\sum ^n_{i=1} K^{(1)}\left( \frac{x-X_i}{h}\right) \ \text {and}\ h$ is a multiple of $n^{-1/9}$.

Proof of Lemma

Under the stated conditions on f we see that as $n\rightarrow \infty $

$$\begin{aligned} E(f^{(1)}_n(x_0))=f^{(1)}(x_0)+o(1). \end{aligned}$$

Therefore for every $\epsilon >0$ there exists N such that for $n>N$

$$\begin{aligned} \left| E(f^{(1)}_n(x_0))-f^{(1)}(x_0)\right| <\epsilon /2. \end{aligned}$$

Therefore for $n>N$,

$$\begin{aligned} P\left( |f^{(1)}_n(x_0)-f^{(1)}(x_0)|>\epsilon \right)\le & {} P\left( |f^{(1)}_n(x_0)-E\left( f^{(1)}_n(x_0)\right) |>\epsilon /2\right) \\= & {} P\left( \frac{1}{n}\left| \sum ^n_{i=1}Y_{ni}\right| >\epsilon h^2/2\right) , \end{aligned}$$

where $Y_{ni}=\left[ K^{(1)}\left( \frac{x_0-X_i}{h}\right) -E\left\{ K^{(1)}\left( \frac{x_0-X_i}{h}\right) \right\} \right] ,\ i=1,\ldots ,n,$ which represents a sequence of stationary strongly mixing mean zero bounded random variables, with mixing coefficient bounded above by $\alpha (n)$. Therefore using the Bernstein type inequality for strongly mixing processes in Merlevède et al. (2009) we get that for $n\ge 4$

$$\begin{aligned} P\left( |f^{(1)}_n(x_0)-f^{(1)}(x_0)|>\epsilon \right) \le \exp \left( -\frac{Cnh^4\epsilon ^2}{||K^{(1)}||+\epsilon h^2\log (n)\log \log (n)}\right) \end{aligned}$$

For h equal to a multiple of $n^{-1/9}$,

$$\begin{aligned} P\left( |f^{(1)}_n(x_0)-f^{(1)}(x_0)|>\epsilon \right) \le \exp \left( -\frac{C_1n^{5/9}\epsilon ^2}{\log (n)\log \log (n)}\right) ,\ C_1>0. \end{aligned}$$

Now using Borel–Cantelli Lemma we see that $f^{(1)}_n(x_0)\rightarrow f^{(1)}(x_0)$, almost surely, as $n\rightarrow \infty $. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dutta, S. Distribution function estimation via Bernstein polynomial of random degree. Metrika 79, 239–263 (2016). https://doi.org/10.1007/s00184-015-0553-9

Download citation

Received: 11 April 2014
Published: 26 June 2015
Issue Date: April 2016
DOI: https://doi.org/10.1007/s00184-015-0553-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distribution function estimation via Bernstein polynomial of random degree

Abstract

Access this article

Similar content being viewed by others

On Empirical Distribution Functions Under Auxiliary Information

On the Parameters Estimators for a Discrete Analog of the Generalized Exponential Distribution

Estimation of the PDF and the CDF of exponentiated moment exponential distribution

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Proof of Lemma 1

Lemma 2

Proof of Lemma

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Distribution function estimation via Bernstein polynomial of random degree

Abstract

Access this article

Similar content being viewed by others

On Empirical Distribution Functions Under Auxiliary Information

On the Parameters Estimators for a Discrete Analog of the Generalized Exponential Distribution

Estimation of the PDF and the CDF of exponentiated moment exponential distribution

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proof of Lemma 1

Lemma 2

Proof of Lemma

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation