The dependence between the sampling interval of the domain of values of a one-dimensional random variable and the blur coefficient of the kernel probability density estimate is determined. The studies used the results of an analysis of the asymptotic properties of a nonparametric estimate of the probability density of the Rosenblatt–Parzen type and its modification. It is shown that the modification of the kernel probability density estimate is a smoothed histogram. The optimal expressions for the kernel function blur coefficient and the length of the sampling interval of the domain of values of a one-dimensional random variable are considered. These parameters are obtained from the condition of minimum mean square deviations of the considered probability density estimates. On this basis, a relationship was established between the studied parameters, which is determined by a constant and depends on the applied kernel function and the volume of the initial statistical data. The values of the detected constant are characterized by the form of the reconstructed probability density and are independent of its parameters. According to the data of computational experiments, formulas are proposed for estimating the analyzed constant by the value of the antikurtosis coefficient for symmetric and asymmetric distribution laws. To estimate the antikurtosis coefficient, we used the initial statistical data in the problem of reconstructing the probability density. The results obtained make it possible to quickly determine the length of the sampling interval from the value of the kernel function blur coefficient, which is relevant when testing hypotheses about the distributions of random variables. The presented conclusions are confirmed by the results of computational experiments.
Similar content being viewed by others
References
V. S. Pugachev, Probability Theory and Mathematical Statistics, Fizmatlit, Moscow (2002).
H. A. Sturges, “The choice of a class interval,” J. Am. Stat. Ass., 21, 65–66 (1926).
I. Heinhold and K. W. Gaede, Ingeniur Statistic, Springer Verlag, München, Wien (1964).
M. P. Wand, “Data-based choice of histogram bin width,” Am. Statistician, 51, No. 1, 59–64 (1997).
D. W. Scott, Multivariate Density Estimation: Theory, Practice, and Visualization, John Wiley & Sons, N. J. (2015).
A. V. Lapko and V. A. Lapko, “Optimal choice of the number of sampling intervals for the domain of variation of a one-dimensional random variable when estimating the probability density,” Izmer. Tekhn., No. 7, 24–27 (2013).
A. V. Lapko and V. A. Lapko, “Estimation of the parameters of the optimal discretization formula for the domain of values of a two-dimensional random variable,” Izmer. Tekhn., No. 5, 9–13 (2018), DOI: https://doi.org/10.32446/0368-1025it. 2018-8-9-13.
A. V. Lapko and V. A. Lapko, “Sampling method for the domain of values of a multidimensional random variable,” Izmer. Tekhn., No. 1, 16–20 (2019), DOI: https://doi.org/10.32446/0368-1025it.2019-1-16-20.
A. V. Lapko and V. A. Lapko, “Choice of blur coefficient of kernel estimates of probability density in large samples,” Izmer. Tekhn., No. 5, 3–6 (2019), DOI: https://doi.org/10.32446/0368-1025it.2019-5-3-6
A. V. Lapko and V. A. Lapko, “Technique for quick selection of blurring coefficients of kernel functions in a nonparametric pattern recognition algorithm,” Izmer. Tekhn., No. 4, 4–8 (2019), DOI: https://doi.org/10.32446/0368-1025it.2019-4-4-8.
S. J. Sheather, “Density estimation,” Stat. Sci., 19, No. 4, 588–597 (2004).
T. Duong, “Kernel density estimation and kernel discriminant analysis for multivariate data in R,” J. Stat. Soft., 21, No. 7, 1–16 (2007), DOI: https://doi.org/10.18637/jss.v021.i07.
A. V. Dobrovidov and I. M. Rudko, “Choice of the window width of the kernel function in a non-parametric estimation of the derivative of density by the method of smoothed cross-validation,” Avtomat. Telemekh., No. 2, 42–58 (2010).
Z. I. Botev, J. F. Grotowski, and D. P. Kroese, “Kernel density estimation via diffusion,” Ann. Stat., 38, No. 5, 2916–2957 (2010).
S. Chen, “Optimal bandwidth selection for kernel density functionals estimation,” J. Prob. Stat., 2015, 1–21 (2015).
T. A. O’Brien, K. Kashinath, N. R. Cavanaugh, et al., “A fast and objective multidimensional kernel density estimation method: fastKDE,” Comp. Stat. Data Anal., 101, 148–160 (2016), DOI: https://doi.org/10.1016/j.csda.2016.02.02.014.
M. I. Borrajo, W. González-Manteiga, and M. D. Martínez-Miranda, “Bandwidth selection for kernel density estimation with length-biased data,” J. Nonparam. Stat., 29, No. 3, 636–668 (2017).
E. Parzen, “On estimation of a probability density function and mode,” Ann. Math. Stat., 33, No. 5, 1065–1076 (1962), DOI: https://doi.org/10.1214/aoms/1177704472.
V. A. Epanechnikov, “Nonparametric estimation of multidimensional probability density,” Teor. Prob. Its Applic., 14, No. 1, 156–161 (1969).
L. Dervoi and L. Dierfi , Nonparametric Density Estimation (L1-approach), Mir, Moscow (1988).
A. V. Lapko and V. A. Lapko, “Regression estimation of multidimensional probability density and its properties,” Avtometriya. 50, No. 2, 50–56 (2010).
A. V. Lapko and V. A. Lapko, “Fast algorithm for choosing the blur coefficients of kernel functions in a non-parametric estimate of the probability density,” Izmer. Tekhn., No. 6, 16–20 (2018), DOI: https://doi.org/10.32446/0368-1025it-2018-6-16-20.
Author information
Authors and Affiliations
Corresponding author
Additional information
Translated from Izmeritel’naya Tekhnika, No. 9, pp. 3–8, September, 2019.
Rights and permissions
About this article
Cite this article
Lapko, A.V., Lapko, V.A. Dependence Between Histogram Parameters and the Kernel Estimate of a Unimodal Probability Density. Meas Tech 62, 747–753 (2019). https://doi.org/10.1007/s11018-019-01690-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11018-019-01690-2