Abstract
The structure of a nonparametric system for automatic classification of large-scale statistical data is proposed and substantiated. The structure of the system under consideration is made up of a technique for compressing the initial information, algorithms for automatic classification of the transformed data, and a procedure for aggregating the results obtained. To implement the functions of the system under study, new methods are used for testing hypotheses about the distributions of random variables and discretizing the range of their values. The effectiveness of the system is illustrated by the results of its application in assessing the state of forests damaged by the four-eyed fir bark beetle, according to remote sensing data.
Similar content being viewed by others
REFERENCES
A. W. Bowman, “A comparative study of some kernel-based nonparametric density estimators,” J. Stat. Comput. Simul. 21, 313–327 (1982). https://doi.org/10.1080/00949658508810822
Ya. Z. Cypkin, Fundamentals of the Theory of Learning Systems (Nauka, Moscow, 1970).
A. A. Dorofeyuk, “Automatic classification algorithms (overview),” Avtom. Telemekh. 12, 78–113 (1971).
A. A. Dorofeyuk, “Methodology of expert classification analysis in the management and processing of complex data (history and prospects of development),” Probl. Upr. 3 (1), 19–28 (2009).
S. Dutta, “Cross-validation Revisited,” Commun. Stat. Simul. Comput. 45, 472–490 (2016). https://doi.org/10.1080/03610918.2013.862275
V. A. Epanechnikov, “Non-parametric estimation of a multivariate probability density,” Theory Probab. Its Appl. 14, 153–158 (1969). https://doi.org/10.1137/1114019
P. Hall, “Large sample optimality of least squares cross-validation in density estimation,” Ann. Stat. 11, 1156–1174 (1983). https://doi.org/10.1214/aos/1176346329
N. Heidenreich, A. Schindler, and S. Sperlich, “Bandwidth selection for kernel density estimation: a review of fully automatic selectors,” AStA Adv. Stat. Anal. 97, 403–433 (2013). https://doi.org/10.1007/s10182-013-0216-y
J. Heinhold and K. Gaede, Ingeniur-Statistik (Springer, Wien, 1964).
M. Jiang and S. B. Provost, “A hybrid bandwidth selection methodology for kernel density estimation,” J. Stat. Comput. Simul. 84, 614–627 (2014). https://doi.org/10.1080/00949655.2012.721366
A. V. Lapko and V. A. Lapko, “Nonparametric algorithm of automatic classification under conditions of large-scale statistical data,” Inf. Sist. Upr. 57 (3), 59–70 (2018). https://doi.org/10.22250/isu.2018.57.59-70
A. V. Lapko, V. A. Lapko, S. T. Im, V. P. Tuboltsev, and V. A. Avdeenok, “Nonparametric algorithm of identification of classes corresponding to single-mode fragments of the probability density of multidimensional random variables,” Optoelectron., Instrum. Data Process. 55, 230–236 (2019). https://doi.org/10.3103/s8756699019030038
A. V. Lapko and V. A. Lapko, “Regression estimate of the multidimensional probability density and its properties,” Optoelectron., Instrum. Data Process. 50, 148–153 (2014). https://doi.org/10.3103/S875669901402006X
A. V. Lapko and V. A. Lapko, “Fast algorithm for choosing blur coefficients in multidimensional kernel probability density estimates,” Meas. Tech. 61, 979–986 (2021). https://doi.org/10.1007/s11018-019-01536-x
A. V. Lapko and V. A. Lapko, “Analysis of optimization methods for nonparametric estimation of the probability density with respect to the blur factor of kernel functions,” Meas. Tech. 60, 515–522 (2017). https://doi.org/10.1007/s11018-017-1228-x
A. V. Lapko and V. A. Lapko, “Optimal selection of the number of sampling intervals in domain of variation of a one-dimensional random variable in estimation of the probability density,” Meas. Tech. 56, 763–767 (2013). https://doi.org/10.1007/s11018-013-0279-x
A. V. Lapko and V. A. Lapko, “Selection of the optimal number of intervals sampling the region of values of a two-dimensional random variable,” Meas. Tech. 59, 122–126 (2016). https://doi.org/10.1007/s11018-016-0928-y
A. V. Lapko and V. A. Lapko, “Discretization method for the range of values of a multi-dimensional random variable,” Meas. Tech. 62, 16–22 (2019). https://doi.org/10.1007/s11018-019-01579-0
A. V. Lapko and V. A. Lapko, “Integral estimate from the square of the probability density for a one-dimensional random variable,” Meas. Tech. 63, 534–542 (2020). https://doi.org/10.1007/s11018-020-01820-1
A. V. Lapko and V. A. Lapko, “Nonparametric algorithms of pattern recognition in the problem of testing a statistical hypothesis on identity of two distribution laws of random variables,” Optoelectron., Instrum. Data Process. 46, 545–550 (2010). https://doi.org/10.3103/s8756699011060069
Q. Li and J. S. Racine, Nonparametric Econometrics: Theory and Practice (Princeton Univ. Press, 2007).
E. Parzen, “On estimation of a probability density function and mode,” Ann. Math. Stat. 33, 1065–1076 (1962). https://doi.org/10.1214/aoms/1177704472
M. Rudemo, “Empirical choice of histogram and kernel density estimators,” Scandinavian J. Stat. 9 (2), 65–78 (1982).
V. I. Vasil’ev and S. N. Esh, “Features of self-learning and clustering algorithms,” Upravlyayushchie Sist. Mash.y, No. 3, 3–9 (2011).
I. V. Zenkov, A. V. Lapko, V. A. Lapko, S. T. Im, V. P. Tuboltsev, and V. L. Avdeenok, “A nonparametric algorithm for automatic classification of large multivariate statistical data sets and its application,” Comput. Opt. 45, 253–260 (2021). https://doi.org/10.18287/2412-6179-co-801
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
The authors declare that they have no conflicts of interest.
Additional information
Alexander Vasilievich Lapko. Born in 1949. In 1971 graduated from the Frunze Polytechnic Institute. Doctor of Engineering Sciences since 1990. Works at the Institute of Computational Modelling of the Siberian Branch of the Russian Academy of Sciences as a chief researcher.
Research interests: nonparametric statistics; pattern recognition systems; modeling and optimization of uncertain systems.
Author of 250 scientific articles, including 18 monographs.
Full member of the International Academy of Sciences of Higher Education, member of the editorial board of the scientific journal Computer Science and Control Systems, Honored Worker of Science of the Russian Federation, Honored Worker of Higher Professional Education of the Russian Federation.
Vasily Alexandrovich Lapko. Born in 1974. In 1996 graduated from the Krasnoyarsk State Technical University. Doctor of Engineering Sciences since 2004 in the specialty of System Analysis, Management, and Information Processing. Works at the Institute of Computational Modelling of the Siberian Branch of the Russian Academy of Sciences as a leading researcher. Head of the Department of Space Tools and Technologies of the Reshetnev Siberian State Technical University.
Research interests: nonparametric statistics; pattern recognition systems; modeling of uncertain systems; collective assessment methods.
Author of 205 scientific articles, including eight monographs.
Vitaly Pavlovich Tuboltsev. Born in 1998. In 2022 graduated from the Reshetnev Siberian State University of Science and Technology. Graduate student. Works as an engineer of the first category at the department of remote observations and geoinformation systems of the branch of FBI “RCFH” Centre of Forest Health of Krasnoyarsk oblast, Krasnoyarsk, Russia.
Research interests: development of information tools, nonparametric classification systems, fast decision rule optimization algorithms, remote sensing data processing.
Author of seven scientific articles.
Translated by L. A. Solovyova
Rights and permissions
About this article
Cite this article
Lapko, A.V., Lapko, V.A. & Tuboltsev, V.P. Nonparametric System for Automatic Classification of Large-Scale Statistical Data. Pattern Recognit. Image Anal. 33, 576–583 (2023). https://doi.org/10.1134/S1054661823030252
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1054661823030252