Skip to main content
Log in

Multiplicative bias correction for discrete kernels

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

In this paper, we prove that two multiplicative bias correction techniques (MBC) can be applied for discrete kernels in the context of probability mass function estimation. First, some properties of the MBC discrete kernel estimators (bias, variance and mean integrated squared error) are investigated. Second, the popular cross-validation technique is adapted for bandwidth selection. Finally, a simulation study and a real data application for discrete data illustrate the performance of the MBC estimators based on dirac discrete uniform and triangular discrete kernels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Aitchison J, Aitken CGG (1976) Multivariate binary discrimination by the kernel method. Biometrika 63:413–420

    Article  MathSciNet  MATH  Google Scholar 

  • Belaid N, Adjabi S, Zougab N, Kokonendji CC (2016) Bayesian bandwidth selection in discrete multivariate associated kernel estimators for probability mass functions. J Korean Stat Soc 45:557–567

    Article  MathSciNet  MATH  Google Scholar 

  • Chu CY, Henderson DJ, Parmeter CF (2015) Plug-in bandwidth selection for kernel density estimation with discrete data. Econometrics 3:199–214

    Article  Google Scholar 

  • Funke B, Kawka R (2015) Nonparametric density estimation for multivariate bounded data using two non-negative multiplicative bias correction methods. Comput Stat Data Anal 92:148–162

    Article  MathSciNet  Google Scholar 

  • Greene W (2011) Econometric analysis. Pearson, Cambridge

    Google Scholar 

  • Hirukawa M (2010) Nonparametric multiplicative bias correction for kernel-type density estimation on the unit interval. Comput Stat Data Anal 54:473–495

    Article  MathSciNet  MATH  Google Scholar 

  • Hirukawa M, Sakudo M (2014) Nonnegative bias reduction methods for density estimation using asymmetric kernels. Comput Stat Data Anal 75:112–123

    Article  MathSciNet  Google Scholar 

  • Hirukawa M, Sakudo M (2015) Family of the generalised gamma kernels: a generator of asymmetric kernels for nonnegative data. J Nonparametric Stat 27:41–63

    Article  MathSciNet  MATH  Google Scholar 

  • Jones MC, Foster PJ (1993) Generalized jackknifing and higher order kernels. J Nonparametric Stat 3:81–94

    Article  MathSciNet  MATH  Google Scholar 

  • Jones MC, Linton O, Nielsen JP (1995) A simple bias reduction method for density estimation. Biometrika 82:327–338

    Article  MathSciNet  MATH  Google Scholar 

  • Kokonendji CC, Senga Kiessé T (2011) Discrete associated kernels method and extensions. Stat Methodol 8:497–516

    Article  MathSciNet  MATH  Google Scholar 

  • Kokonendji CC, Senga Kiessé T, Zocchi SS (2007) Discrete triangular distributions and non-parametric estimation for probability mass function. J Nonparametric Stat 19:241–254

    Article  MathSciNet  MATH  Google Scholar 

  • Kokonendji CC, Somé SM (2015) On multivariate associated kernels for smoothing general density function. arXiv: 1502.01173

  • Racine JS, Li Q (2004) Nomparametric estimation of regression functions with both categorical and continuous data. J Econom 119:99–130

    Article  MATH  Google Scholar 

  • Senga Kiessé T, Mizère D (2012) Weighted Poisson and semiparametric kernel models applied for parasite growth. Aust N Z J Stat 55:1–13

    Article  MathSciNet  MATH  Google Scholar 

  • Terrell GR, Scott DW (1980) On improving convergence rates for nonnegative kernel density estimators. Ann Stat 8:1160–1163

    Article  MathSciNet  MATH  Google Scholar 

  • Wang M, Ryzin J (1980) A class of smooth estimators for discrete distributions. Biometrika 68:301–309

    Article  MathSciNet  MATH  Google Scholar 

  • Zougab N, Adjabi S (2015) Multiplicative bias correction for generalized Birnbaum–Saunders kernel density estimators and application to nonnegative heavy tailed data. J Korean Stat Soc 45:51–63

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research has been supported by the Unit of Research LAMOS of University of Bejaia. The authors thank the editor, an associate editor and anonymous referees for their valuable comments that allowed us to improve this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lynda Harfouche.

Appendix

Appendix

We present a sketch of proofs of Theorems 1 and 2. We provide the proofs when the discrete triangular kernel is used. The proofs using the other kernels can be given similarly.

1.1 Sketch of the proof of Theorem 1

1.1.1 Bias

First, note that \(\mathbb {E}\left( \widehat{f}_{DT,h}(x)\right) =\mathbb {E}(f(\mathcal {T}))\), where the random variable \(\mathcal {T}\sim DT(a;x,h)\). By using a fourth-order discrete Taylor expansion around \(\mathcal {T}=x\) for

$$\begin{aligned} I_{h}(x)=\mathbb {E}(\widehat{f}_{DT,h})=\sum K_{DT,h}(y)f(y)=\mathbb {E}(f(\mathcal {T})), \end{aligned}$$

we have

$$\begin{aligned} I_{h}(x)=f(x)+\sum \limits _{j=1}^{4}\frac{f^{(j)}}{j!} \mathbb {E}(\mathcal {T}-x)^{j}+o(\mathbb {E}(\mathcal {T}-x)^{4}). \end{aligned}$$

By using the property of the discrete triangular random variable and a Taylor expansion around \(h=0\),

$$\begin{aligned} \mathbb {E}(\mathcal {T}-x)= & {} 0,\\ \mathbb {E}(\mathcal {T}-x)^{2}= & {} \left\{ \log (a+1)S(a)-2\sum \limits _{k=1}^{a}k^{2}\log (k)\right\} h \\&+\left\{ \frac{\log ^{2}(a+1)}{2}S(a)-\sum \limits _{k=1}^{a}k^{2}\log ^{2} (k)\right\} h^{2}+o(h^{2}),\\ \mathbb {E}(\mathcal {T}-x)^{3}= & {} 0,\\ \mathbb {E}(\mathcal {T}-x)^{4}= & {} \left\{ \log (a+1)R(a)-2\sum \limits _{k=1}^{a}k^{4}\log (k)\right\} h \\&+\left\{ \frac{\log ^{2}(a+1)}{2}R(a)-\sum \limits _{k=1}^{a}k^{4}\log ^{2} (k)\right\} h^{2}+o(h^{2}), \end{aligned}$$

where

$$\begin{aligned} R(a)=\frac{2}{5}a^{5}+a^{4}+\frac{2}{3}a^{3}-\frac{1}{15}a. \end{aligned}$$

The Taylor expansion of \(I_{h}(x)\) around \(h=0\) is then given by

$$\begin{aligned} I_{h}(x)=f(x)\left\{ 1+\frac{l_{1}(x,f)}{f(x)}h+\frac{l_{2}(x,f)}{f(x)}h^{2} +o(h^{2})\right\} , \end{aligned}$$

where

$$\begin{aligned} l_{1}(x,f)&=\frac{f^{(2)}(x)}{2}\left( \log (a+1)S(a)-2\sum \limits _{k=1}^{a} k^{2}\log (k)\right) \\&\quad + \frac{f^{(4)}(x)}{24}\left( \log (a+1)R(a)-2\sum \limits _{k=1}^{a} k^{4}\log (k)\right) \end{aligned}$$

and

$$\begin{aligned} l_{2}(x,f)&=\frac{f^{(2)}(x)}{2}\left( \frac{\log ^{2}(a+1)}{2}S(a) -\sum \limits _{k=1}^{a}k^{2}\log ^{2}(k)\right) \\&\quad + \frac{f^{(4)}(x)}{24}\left( \frac{\log ^{2}(a+1)}{2}R(a) -\sum \limits _{k=1}^{a}k^{4}\log ^{2}(k)\right) . \end{aligned}$$

Similarly, \(I_{h/c}(x)=\mathbb {E}\left( \widehat{f}_{DT,h/c}(x)\right) \) can be approximated by

$$\begin{aligned} I_{h/c}(x)=f(x)\left\{ 1+\frac{1}{c}\frac{l_{1}(x,f)}{f(x)}h +\frac{1}{c^{2}}\frac{l_{2}(x,f)}{f(x)}h^{2}+o(h^{2})\right\} . \end{aligned}$$

Now, we define

$$\begin{aligned} \widehat{f}_{DT,h}(x)=I_{h}(x)+Z \end{aligned}$$

and

$$\begin{aligned} \widehat{f}_{DT,h/c}(x)=I_{h/c}(x)+W. \end{aligned}$$

The estimator \(\tilde{f}_{TS,DT}\) can be written as follows:

$$\begin{aligned} \tilde{f}_{TS,DT}= \left\{ I_{h}(x)\right\} ^{\frac{1}{1-c}}\left\{ 1+\frac{Z}{I_{h}(x)}\right\} ^{\frac{1}{1-c}} \left\{ I_{h/c}(x)\right\} ^{-\frac{c}{1-c}}\left\{ 1+\frac{W}{I_{h/c}(x)}\right\} ^{-\frac{c}{1-c}}. \end{aligned}$$

Using the expansion \((1+t)^{\alpha }=1+\alpha t+o(t^{2})\), we then have

$$\begin{aligned} \tilde{f}_{TS,DT}(x)= & {} \left\{ I_{h}(x)\right\} ^{\frac{1}{1-c}}\left\{ I_{h/c}(x)\right\} ^{-\frac{c}{1-c}} +\frac{1}{1-c}Z\left\{ \frac{I_{h}(x)}{I_{h/c}(x)}\right\} ^{-\frac{c}{1-c}}\nonumber \\&-\frac{c}{1-c}W\left\{ \frac{I_{h}(x)}{I_{h/c}(x)}\right\} ^{\frac{1}{1-c}}+O\left\{ (Z+W)^{2}\right\} . \end{aligned}$$
(4)

Based on Assumption 2 and using the same calculations as in Hirukawa (2010) and Terrell and Scott (1980), we can show easily that

$$\begin{aligned} \mathbb {E}\left( \tilde{f}_{TS,DT}(x)\right) =f(x)+\frac{1}{c}\left[ \frac{1}{2}\left\{ \frac{l^{2}_{1}(x,f)}{f(x)}-l_{2}(x,f)\right\} \right] h^{2}+o(h^{2}). \end{aligned}$$

1.1.2 Variance

For the variance, from Eq. (4) we have

$$\begin{aligned} \mathrm{Var}\left( \tilde{f}_{TS,DT}(x)\right)= & {} \mathbb {E}\left( \frac{1}{1-c}Z-\frac{c}{1-c}W\right) ^{2}+o(n^{-1})\\= & {} \mathrm{Var}\left( \frac{1}{1-c}\widehat{f}_{DT,h}(x)-\frac{c}{1-c}\widehat{f}_{DT,h/c}(x)\right) +o(n^{-1})\\= & {} \frac{1}{(1-c)^{2}}\mathrm{Var}\left( \widehat{f}_{DT,h}(x)\right) +\frac{c^{2}}{(1-c)^{2}}\mathrm{Var}\left( \widehat{f}_{DT,h/c}(x)\right) \\&-\frac{2c}{(1-c)^{2}}\mathrm{cov}\left( \widehat{f}_{DT,h}(x),\widehat{f}_{DT,h/c}(x)\right) . \end{aligned}$$

First, note that the terms \(\mathrm{Var}\left( \widehat{f}_{DT,h}(x)\right) \) and \(\mathrm{Var}\left( \widehat{f}_{DT,h/c}(x)\right) \) are given by [see Kokonendji et al. (2007)]:

$$\begin{aligned} \mathrm{Var}\left( \widehat{f}_{DT,h}(x)\right)= & {} f(x)(1-f(x))K^{2}_{DT,h}(x)+o\left( \frac{1}{n}\right) \\= & {} \frac{f(x)}{n}(1-f(x))\frac{(1+a)^{2h}}{P^{2}(a,h)}+o\left( \frac{1}{n}\right) , \end{aligned}$$

and

$$\begin{aligned} \mathrm{Var}\left( \widehat{f}_{DT,h/c}(x)\right)= & {} f(x)(1-f(x))K^{2}_{DT,h/c}(x)+o\left( \frac{1}{n}\right) \\= & {} \frac{f(x)}{n}(1-f(x))\frac{(1+a)^{2h/c}}{P^{2}(a,h/c)}+o\left( \frac{1}{n}\right) . \end{aligned}$$

Now,

$$\begin{aligned}&\mathrm{cov}\left( \widehat{f}_{DT,h}(x),\widehat{f}_{DT,h/c}(x)\right) \\&\quad = \mathbb {E}\left( \widehat{f}_{DT,h}(x)\widehat{f}_{DT,h/c}(x)\right) -\mathbb {E}\left( \widehat{f}_{DT,h}(x)\right) \mathbb {E}\left( \widehat{f}_{DT,h/c} (x)\right) \\&\quad =\frac{1}{n^{2}}\sum \limits _{i=1}^{n}\sum \limits _{j=1}^{n}\mathbb {E} \left( K_{DT,h}(X_{i})K_{DT,h/c}(X_{j})\right) -\mathbb {E} \left( K_{DT,h}(X_{i})\right) \mathbb {E}\left( K_{DT,h/c}(X_{j})\right) \\&\quad =\frac{1}{n}\mathbb {E}\left( K_{DT,h}(X_{i})K_{DT,h/c}(X_{i})\right) +\frac{(n-1)}{n}\mathbb {E}\left( K_{DT,h}(X_{i})\right) \mathbb {E}\left( K_{DT,h/c} (X_{j})\right) \\&\qquad -\mathbb {E}\left( K_{DT,h}(X_{i})\right) \mathbb {E}\left( K_{DT,h/c}(X_{j})\right) \\&\quad =\frac{1}{n}\mathbb {E}\left( K_{DT,h}(X_{i})K_{DT,h/c}(X_{i})\right) -\frac{1}{n}\mathbb {E}\left( K_{DT,h}(X_{i})\right) \mathbb {E}\left( K_{DT,h/c} (X_{j})\right) \\&\quad =\frac{1}{n}K_{DT,h}(x)K_{DT,h/c}(x)f(x)-\frac{1}{n}K_{DT,h}(x)K_{DT,h/c} (x)f^{2}(x)+o\left( \frac{1}{n}\right) \\&\quad =\frac{1}{n}f(x)(1-f(x))\frac{(1+a)^{h}}{P(a,h)}\frac{(1+a)^{h/c}}{P(a,h/c)} +o\left( \frac{1}{n}\right) . \end{aligned}$$

Therefore, the variance of \(\tilde{f}_{TS,DT}(x)\) is given by

$$\begin{aligned} \mathrm{Var}\left( \tilde{f}_{TS,DT}(x)\right) =\frac{f(x)(1-f(x))}{n(1-c)^{2}} \left( \frac{(1+a)^{h}}{P(a,h)}-c\frac{(1+a)^{h/c}}{P(a,h/c)}\right) ^{2}+o \left( \frac{1}{n}\right) , \end{aligned}$$

which corresponds to the results in Theorem 1.

1.2 Sketch of the proof of Theorem 2

1.2.1 Bias

At first, the estimator \(\tilde{f}_{JLN,DT}\) can be written as [see Hirukawa (2010)]

$$\begin{aligned} \tilde{f}_{JLN,DT}(x)=f(x)\left\{ 1+\frac{\widehat{f}_{DT}(x)-f(x)}{f(x)} \right\} \left\{ 1+(\psi (x)-1)\right\} , \end{aligned}$$

where \(\psi (x)=n^{-1}\sum _{i=1}^{n}K_{DT,h}(X_{i})/\widehat{f}_{DT}(X_{i})\). Then, we have

$$\begin{aligned} \mathbb {E}\left( \tilde{f}_{JLN,DT}(x)\right)= & {} f(x)+f(x)\mathbb {E} \left\{ \frac{\widehat{f}_{DT}(x)-f(x)}{f(x)}\right\} +f(x) \mathbb {E}\left\{ \psi (x)-1\right\} \\&+f(x)\mathbb {E}\left\{ \left( \frac{\widehat{f}_{DT}(x)-f(x)}{f(x)}\right) \left( \psi (x)-1\right) \right\} . \end{aligned}$$

By using Assumption 2 and the properties of DT random variables, the terms \(\mathbb {E}\left\{ \frac{\widehat{f}_{DT}(x)-f(x)}{f(x)}\right\} \), \(\mathbb {E}\left\{ \psi (x)-1\right\} \) and \(\mathbb {E}\left\{ \left( \frac{\widehat{f}_{DT}(x)-f(x)}{f(x)}\right) \left( \psi (x)-1\right) \right\} \) can be approximated following the same procedures as in Hirukawa (2010). Thus, \(\mathbb {E}(\widehat{f}_{JLN-DT})\) is approximated by

$$\begin{aligned} \mathbb {E}(\tilde{f}_{JLN-DT}(x))= & {} f(x)-f(x)\left[ \frac{1}{2}\left\{ \log (a+1)S(a)-2\sum \limits _{k=1}^{a} k^{2}\log (k)\right\} q^{(1)}(x)\right] h^{2}\\&-f(x) \left[ \frac{1}{24}\left\{ \log (a+1)R(a)-2\sum \limits _{k=1}^{a}k^{4} \log (k)\right\} q^{(2)}(x)\right] h^{2}+o(h^{2}),\\ \mathbb {E}(\tilde{f}_{JLN-DT}(x))= & {} f(x)-f(x)\left[ \frac{1}{2}\left\{ \log (a+1)S(a)-2\sum \limits _{k=1}^{a} k^{2}\log (k)\right\} q^{(1)}(x)\right. \\&\left. +\frac{1}{24}\left\{ \log (a+1)R(a)-2\sum \limits _{k=1}^{a}k^{4} \log (k)\right\} q^{(2)}(x)\right] h^{2}+o(h^{2}), \end{aligned}$$

where \(q(x)=l_{1}(x,f)/f(x)\) with \(l_{1}(x,f)\) given in the proof of Theorem 1.

1.2.2 Variance

Note that following Hirukawa (2010) and Jones et al. (1995), we can show that \(\tilde{f}_{JLN,DT}(x)\) is equivalent to

$$\begin{aligned} \tilde{f}_{JLN-DT}(x)=f(x)\frac{1}{n}\sum \limits _{i=1}^{n}\frac{K_{DT,h} (X_{i})}{f(X_{i})}. \end{aligned}$$

It follows that

$$\begin{aligned} \mathrm{Var}\left( \widetilde{f}_{JLN-DT}(x)\right)= & {} f^{2}(x)\frac{1}{n}\mathrm{Var}\left\{ \frac{K_{DT,h}(X_{i})}{f(X_{i})}\right\} \\= & {} f^{2}(x)\frac{1}{n}\left\{ \mathbb {E}\left( \frac{K^{2}_{DT,h}(X_{i})}{f^{2}(X_{i})}\right) -\left[ \mathbb {E}\left( \frac{K_{DT,h}(X_{i})}{f(X_{i})}\right) \right] ^{2}\right\} \\= & {} \frac{f^{2}(x)}{n}\left\{ \frac{K^{2}_{DT,h}(x)}{f(x)}-K^{2}_{DT,h}(x)\right\} +o\left( \frac{1}{n}\right) \\= & {} \frac{f(x)}{n}(1-f(x))\frac{(1+a)^{2h}}{P^{2}(a,h)}+o\left( \frac{1}{n}\right) . \end{aligned}$$

Therefore, we obtain the approximation for the variance given in Theorem 2. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Harfouche, L., Adjabi, S., Zougab, N. et al. Multiplicative bias correction for discrete kernels. Stat Methods Appl 27, 253–276 (2018). https://doi.org/10.1007/s10260-017-0395-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-017-0395-x

Keywords

Navigation