Abstract—
The properties of a new method for the hypothesis testing of the independence of random variables based on the use of a nonparametric pattern recognition algorithm corresponding to the maximum likelihood criterion are considered. The estimation of the distribution laws in classes is carried out using the initial statistical data under the assumption of the independence and dependence of the analyzed random variables. Under these conditions, estimates of the probabilities of pattern recognition errors in classes are calculated. A decision is made on the independence or dependence of random variables according to their minimum value. The results of the proposed method are compared using the Pearson criterion and the Pearson, Spearman, and Kendall correlation coefficients. When implementing the Pearson criterion, the formula for optimal discretization of the range of values of a two-dimensional random variable is used. Their effectiveness in complicating the dependence between random variables and changing the volume of initial statistical data is studied using computational experiment.
REFERENCES
Pugachev, V.S., Probability Theory and Mathematical Statistics for Engineers, Moscow: Fizmatlit, 2002; Elsevier, 1984. https://doi.org/10.1016/C2013-0-06054-9
Lapko, A.V. and Lapko, V.A., Nonparametric algorithms of pattern recognition in the problem of testing a statistical hypothesis on identity of two distribution laws of random variables, Optoelectron., Instrum. Data Process., 2010, vol. 46, no. 6, pp. 545–550. https://doi.org/10.3103/s8756699011060069
Lapko, A.V. and Lapko, V.A., Comparison of empirical and theoretical distribution functions of a random variable on the basis of a nonparametric classifier, Optoelectron., Instrum. Data Process., 2012, vol. 48, no. 1, pp. 37–41. https://doi.org/10.3103/s8756699012010050
Lapko, A.V. and Lapko, V.A., A technique for testing hypotheses for distributions of multidimensional spectral data using a nonparametric pattern recognition algorithm, Komp’yuternaya Opt., 2019, vol. 43, no. 2, pp. 238–244. https://doi.org/10.18287/2412-6179-2019-43-2-238-244
Lapko, A.V. and Lapko, V.A., Testing the hypothesis of the independence of two-dimensional random variables using a nonparametric algorithm for pattern recognition, Optoelectron., Instrum. Data Process., 2021, vol. 57, no. 2, pp. 149–155. https://doi.org/10.3103/s8756699021020114
Parzen, E., On estimation of a probability density function and mode, Ann. Math. Stat., 1962, vol. 33, no. 3, pp. 1065–1076. https://doi.org/10.1214/aoms/1177704472
Epanechnikov, V.a., Non-parametric estimation of a multivariate probability density, Theory Probab. Its Appl., 1969, vol. 14, no. 1, pp. 153–158. https://doi.org/10.1137/1114019
Lapko, A.V., Medvedev, A.V., and Tishina, E.A., To the optimizatio of nonparametric estimates, Sbornik nauchnykh trudov Algoritmy i programmy dlya sistem avtomatizatsii eksperimental’nykh issledovaniy (Collection of Scientific Papers Algorithms and Programs for Automation Systems of Experimental Research), Frunze: Ilim, 1975, pp. 105–116.
Rudemo, M., Empirical choice of histogram and kernel density estimators, Scandinavian J. Stat., 1982, vol. 9, no. 2, pp. 65–78.
Bowman, A.W., A comparative study of some kernel-based nonparametric density estimators, J. Stat. Comput. Simul., 1982, vol. 21, nos. 3–4, pp. 313–327. https://doi.org/10.1080/00949658508810822
Hall, P., Large sample optimality of least squares cross-validation in density estimation, Ann. Stat., 1983, vol. 11, no. 4, pp. 1156–1174. https://doi.org/10.1214/aos/1176346329
Jiang, M. and Provost, S.B., A hybrid bandwidth selection methodology for kernel density estimation, J. Stat. Comput. Simul., 2014, vol. 84, no. 3, pp. 614–627. https://doi.org/10.1080/00949655.2012.721366
Dutta, S., Cross-validation Revisited, Commun. Stat. Simul. Comput., 2016, vol. 45, no. 2, pp. 472–490. https://doi.org/10.1080/03610918.2013.862275
Heidenreich, N.-B., Schindler, A., and Sperlich, S., Bandwidth selection for kernel density estimation: a review of fully automatic selectors, AStA Adv. Stat. Anal., 2013, vol. 97, no. 4, pp. 403–433. https://doi.org/10.1007/s10182-013-0216-y
Li, Q. and Racine, J.S., Nonparametric Econometrics: Theory and Practice, Princeton: Princeton Univ. Press, 2007.
Lapko, A.V. and Lapko, V.A., Method of fast bandwidth selection in a nonparametric classifier corresponding to the a posteriori probability maximum criterion, Optoelectron., Instrum. Data Process., 2019, vol. 55, no. 6, pp. 597–605. https://doi.org/10.3103/s8756699019060104
Lapko, A.V. and Lapko, V.A., Modified fast algorithm for the bandwidth selection of the kernel density estimation, Optoelectron., Instrum. Data Process., 2020, vol. 56, no. 6, pp. 566–572. https://doi.org/10.3103/s8756699020060102
Scott, D.W., Multivariate Density Estimation: Theory, Practice, and Visualization, Wiley Series in Probability and Statistics, New Jersey: Wiley, 2015. https://doi.org/10.1002/9781118575574
Sheather, S.J., Density estimation, Stat. Sci., 2004, vol. 19, no. 4, pp. 588–597. https://doi.org/10.1214/088342304000000297
Silverman, B.W., Density Estimation for Statistics and Data Analysis, London: Chapman and Hall, 1986.
Lapko, A.V. and Lapko, V.A., Estimation of a nonlinear functional of probability density when optimizing nonparametric decision functions, Meas. Tech., 2021, vol. 64, no. 1, pp. 13–20. https://doi.org/10.1007/s11018-021-01889-2
Lapko, A.V. and Lapko, V.A., Selection of the optimal number of intervals sampling the region of values of a two-dimensional random variable, Meas. Tech., 2016, vol. 59, no. 2, pp. 122–126. https://doi.org/10.1007/s11018-016-0928-y
Lapko, A.V. and Lapko, V.A., Estimation of parameters of the formula for optimal discretization of the range of values of a two-dimensional random variable, Meas. Tech., 2018, vol. 61, no. 5, pp. 427–433. https://doi.org/10.1007/s11018-018-1447-9
Funding
This work was supported by ongoing institutional funding. No additional grants to carry out or direct this particular research were obtained.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors of this work declare that they have no conflicts of interest.
Additional information
Publisher’s Note.
Allerton Press remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Lapko, A.V., Lapko, V.A. & Bakhtina, A.V. Comparison of the Methodology for Hypothesis Testing of the Independence of Two-Dimensional Random Variables Based on a Nonparametric Classifier. Sci. Tech. Inf. Proc. 50, 572–581 (2023). https://doi.org/10.3103/S0147688223060084
Received:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0147688223060084