Abstract
The article considers a problem related to testing the hypothesis about the independence of random variables given large amounts of statistical data. The solution to this problem is necessary when estimating probability densities of random variables and synthesizing algorithms for processing information. A nonparametric procedure is proposed for testing the hypothesis about the independence of random variables in a sample containing a large amount of statistical data. The procedure involves the compression of initial statistical data by decomposing the range of values of random variables. The generated data array consists of the centers of sampling intervals and the corresponding frequencies of observations belonging to the original sample. The obtained data was used in the construction of a nonparametric pattern recognition algorithm, which corresponds to the maximum likelihood criterion. The distribution laws in the classes were evaluated assuming the independence and dependence of the compared random variables. When recovering the distribution laws of random variables in the classes, the regression estimates of probability densities were used. For these conditions, the probability of errors in recognizing patterns in the classes was estimated, and decisions about the independence or dependence of random variables were made according to their minimum value. The procedure was used in the analysis of remote sensing data on forest areas; linear and nonlinear relationships between the spectral features of the subject matter of the study were determined.
Similar content being viewed by others
References
Pugachev, V.S.: Teoriya Veroyatnostej i Matematicheskaya Statistika [Probability Theory and Mathematical Statistics; in Russian], study guide. Fizmatlit Publ, Moscow (2002)
Lapko, A.V., Lapko, V.A., Bakhtina, A.V.: Optoelectron. Instrum. Data Process., 57. No 6, 639–648 (2022). https://doi.org/10.3103/S8756699021060078
Lapko, A.V., Lapko, V.A., Bakhtina, A.V.: Meas. Tech. 65(1), 17–23 (2022). https://doi.org/10.1007/s11018-022-02043-2
Lapko, A.V., Lapko, V.A., Bakhtina, A.V.: Comparison of the methodology for hypothesis testing of the independence of two-dimensional random variables based on a nonparametric classifier. Sci. Tech. Inf. Process. (1), 45–56 (2022)
Lapko, A.V., Lapko, V.A.: Optoelectron. Instrum. Data Process., 50. No 2, 148–153 (2014). https://doi.org/10.3103/S875669901402006X
Parzen, E.: Ann. Math. Stat., 33. Nо 3, 1065–1076 (1962). https://doi.org/10.1214/aoms/1177704472
Epanechnikov, V.A.: Theor. Probab. Appl. 14(1), 156–161 (1969). https://doi.org/10.1137/1114019
Sturgess, H.A.: J Am Stat Assoc 21, 65–66 (1926). https://doi.org/10.1080/01621459.1926.10502161
Heinhold, I., Gaede Ingeniur-Statistic, K.W.: in German. Springer, München, Wien (1964)
Yu, B.: Lemeshko and E. V. Chimitova, “On the selection of the number of intervals in the criteria of agreement of type χ2,”. Ind. Lab. Diagn. Mat. 69(1), 61–67 (2003)
Hacine-Gharbi, A., Ravier, P., Harba, R., Mohamadi, T.: Pattern Recogn. Lett. 33(10), 1302–1308 (2012). https://doi.org/10.1016/j.patrec.2012.02.022
Multivariate Density Estimation, D.W.S.: Theory, Practice, and Visualization, 2nd edn. John Wiley & Sons, NY (2015)
Devroye, L., Lugosi, G.: Test, 13. No 1, 129–145 (2004). https://doi.org/10.1007/BF02603004
Lapko, A.V., Lapko, V.A.: Meas. Tech. 56(7), 763–767 (2013). https://doi.org/10.1007/s11018-013-0279-x
Lapko, A.V., Lapko, V.A.: Meas. Tech. 59(2), 122–126 (2016). https://doi.org/10.1007/s11018-016-0928-y
Lapko, A.V., Lapko, V.A.: Meas. Tech. 62(1), 16–22 (2019). https://doi.org/10.1007/s11018-019-01579-0
Rudemo, M.: Empirical choice of histograms and kernel density estimators. Scand. J. Stat. 9(2), 65–78 (1982)
Hall, P.: Ann. Stat. 11(4), 1156–1174 (1983). https://doi.org/10.1214/aos/1176346329
Bowman, A.W.: Biometrika 71(2), 353–360 (1984). https://doi.org/10.1093/BIOMET/71.2.353
Jiang, M., Provost, S.B.: J. Stat. Comput. Sim. 84(3), 614–627 (2014). https://doi.org/10.1080/00949655.2012.721366
Dutta, S.: Commun. Stat. B–Simul., 45. No 2, 472–490 (2016). https://doi.org/10.1080/03610918.2013.862275
Heidenreich, N.-B., Schindler, A., Sperlich, S.: Adv Stat Anal 97(4), 403–433 (2013). https://doi.org/10.1007/s10182-013-0216-y
Li, Q., Racine Nonparametric Econometrics, J.S.: Theory and Practice. Princeton University Press, Princeton (2007)
Sharakshaneh, A.S., Zheleznov, I.G., Ivnitskij Slozhnye Sistemy, V.A.: in Russian], Vysshaya shkola Publ. Moscow (1977)
Dvorkin, B.: European program GMES and the challenging constellation of Sentinel satellites. Geomatics (3), 14–26 (2011)
Goryainov, V.B., Pavlov, I.V., Tsvetkova, G.M., Teskin Matematicheskaya Statistika, O.I.: in Russian], textbook for universities, MGTU im. N. E. Baumana Publ. Moscow (2001)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Translated from Izmeritel’naya Tekhnika, No. 10, pp. 17–24, October 2023. Russian DOI: https://doi.org/10.32446/0368-1025it.2023-10-17-24
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Original article submitted 06/02/2023. Accepted 09/09/2023
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lapko, A.V., Lapko, V.A. & Bakhtina, A.V. Application of a nonparametric procedure for testing the hypothesis about the independence of random variables given a large amount of statistical data. Meas Tech 66, 744–754 (2024). https://doi.org/10.1007/s11018-024-02288-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11018-024-02288-z
Keywords
- Hypothesis
- Independent random variables
- Hypothesis testing
- Regression estimate
- Probability density
- Pattern recognition
- Remote sensing
- Forest areas