Skip to main content
Log in

Nonparametric System for Automatic Classification of Large-Scale Statistical Data

  • PATTERN RECOGNITION AND IMAGE ANALYSIS AUTOMATED SYSTEMS, HARDWARE AND SOFTWARE
  • Published:
Pattern Recognition and Image Analysis Aims and scope Submit manuscript

Abstract

The structure of a nonparametric system for automatic classification of large-scale statistical data is proposed and substantiated. The structure of the system under consideration is made up of a technique for compressing the initial information, algorithms for automatic classification of the transformed data, and a procedure for aggregating the results obtained. To implement the functions of the system under study, new methods are used for testing hypotheses about the distributions of random variables and discretizing the range of their values. The effectiveness of the system is illustrated by the results of its application in assessing the state of forests damaged by the four-eyed fir bark beetle, according to remote sensing data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.

Similar content being viewed by others

REFERENCES

  1. A. W. Bowman, “A comparative study of some kernel-based nonparametric density estimators,” J. Stat. Comput. Simul. 21, 313–327 (1982). https://doi.org/10.1080/00949658508810822

    Article  MATH  Google Scholar 

  2. Ya. Z. Cypkin, Fundamentals of the Theory of Learning Systems (Nauka, Moscow, 1970).

    Google Scholar 

  3. A. A. Dorofeyuk, “Automatic classification algorithms (overview),” Avtom. Telemekh. 12, 78–113 (1971).

    MathSciNet  Google Scholar 

  4. A. A. Dorofeyuk, “Methodology of expert classification analysis in the management and processing of complex data (history and prospects of development),” Probl. Upr. 3 (1), 19–28 (2009).

    Google Scholar 

  5. S. Dutta, “Cross-validation Revisited,” Commun. Stat. Simul. Comput. 45, 472–490 (2016). https://doi.org/10.1080/03610918.2013.862275

    Article  MathSciNet  MATH  Google Scholar 

  6. V. A. Epanechnikov, “Non-parametric estimation of a multivariate probability density,” Theory Probab. Its Appl. 14, 153–158 (1969). https://doi.org/10.1137/1114019

    Article  MathSciNet  Google Scholar 

  7. P. Hall, “Large sample optimality of least squares cross-validation in density estimation,” Ann. Stat. 11, 1156–1174 (1983). https://doi.org/10.1214/aos/1176346329

    Article  MathSciNet  MATH  Google Scholar 

  8. N. Heidenreich, A. Schindler, and S. Sperlich, “Bandwidth selection for kernel density estimation: a review of fully automatic selectors,” AStA Adv. Stat. Anal. 97, 403–433 (2013). https://doi.org/10.1007/s10182-013-0216-y

    Article  MathSciNet  MATH  Google Scholar 

  9. J. Heinhold and K. Gaede, Ingeniur-Statistik (Springer, Wien, 1964).

    MATH  Google Scholar 

  10. M. Jiang and S. B. Provost, “A hybrid bandwidth selection methodology for kernel density estimation,” J. Stat. Comput. Simul. 84, 614–627 (2014). https://doi.org/10.1080/00949655.2012.721366

    Article  MathSciNet  MATH  Google Scholar 

  11. A. V. Lapko and V. A. Lapko, “Nonparametric algorithm of automatic classification under conditions of large-scale statistical data,” Inf. Sist. Upr. 57 (3), 59–70 (2018). https://doi.org/10.22250/isu.2018.57.59-70

    Article  Google Scholar 

  12. A. V. Lapko, V. A. Lapko, S. T. Im, V. P. Tuboltsev, and V. A. Avdeenok, “Nonparametric algorithm of identification of classes corresponding to single-mode fragments of the probability density of multidimensional random variables,” Optoelectron., Instrum. Data Process. 55, 230–236 (2019). https://doi.org/10.3103/s8756699019030038

    Article  Google Scholar 

  13. A. V. Lapko and V. A. Lapko, “Regression estimate of the multidimensional probability density and its properties,” Optoelectron., Instrum. Data Process. 50, 148–153 (2014). https://doi.org/10.3103/S875669901402006X

    Article  Google Scholar 

  14. A. V. Lapko and V. A. Lapko, “Fast algorithm for choosing blur coefficients in multidimensional kernel probability density estimates,” Meas. Tech. 61, 979–986 (2021). https://doi.org/10.1007/s11018-019-01536-x

    Article  Google Scholar 

  15. A. V. Lapko and V. A. Lapko, “Analysis of optimization methods for nonparametric estimation of the probability density with respect to the blur factor of kernel functions,” Meas. Tech. 60, 515–522 (2017). https://doi.org/10.1007/s11018-017-1228-x

    Article  Google Scholar 

  16. A. V. Lapko and V. A. Lapko, “Optimal selection of the number of sampling intervals in domain of variation of a one-dimensional random variable in estimation of the probability density,” Meas. Tech. 56, 763–767 (2013). https://doi.org/10.1007/s11018-013-0279-x

    Article  Google Scholar 

  17. A. V. Lapko and V. A. Lapko, “Selection of the optimal number of intervals sampling the region of values of a two-dimensional random variable,” Meas. Tech. 59, 122–126 (2016). https://doi.org/10.1007/s11018-016-0928-y

    Article  Google Scholar 

  18. A. V. Lapko and V. A. Lapko, “Discretization method for the range of values of a multi-dimensional random variable,” Meas. Tech. 62, 16–22 (2019). https://doi.org/10.1007/s11018-019-01579-0

    Article  MATH  Google Scholar 

  19. A. V. Lapko and V. A. Lapko, “Integral estimate from the square of the probability density for a one-dimensional random variable,” Meas. Tech. 63, 534–542 (2020). https://doi.org/10.1007/s11018-020-01820-1

    Article  Google Scholar 

  20. A. V. Lapko and V. A. Lapko, “Nonparametric algorithms of pattern recognition in the problem of testing a statistical hypothesis on identity of two distribution laws of random variables,” Optoelectron., Instrum. Data Process. 46, 545–550 (2010). https://doi.org/10.3103/s8756699011060069

    Article  Google Scholar 

  21. Q. Li and J. S. Racine, Nonparametric Econometrics: Theory and Practice (Princeton Univ. Press, 2007).

  22. E. Parzen, “On estimation of a probability density function and mode,” Ann. Math. Stat. 33, 1065–1076 (1962). https://doi.org/10.1214/aoms/1177704472

    Article  MathSciNet  MATH  Google Scholar 

  23. M. Rudemo, “Empirical choice of histogram and kernel density estimators,” Scandinavian J. Stat. 9 (2), 65–78 (1982).

    MathSciNet  MATH  Google Scholar 

  24. V. I. Vasil’ev and S. N. Esh, “Features of self-learning and clustering algorithms,” Upravlyayushchie Sist. Mash.y, No. 3, 3–9 (2011).

  25. I. V. Zenkov, A. V. Lapko, V. A. Lapko, S. T. Im, V. P. Tuboltsev, and V. L. Avdeenok, “A nonparametric algorithm for automatic classification of large multivariate statistical data sets and its application,” Comput. Opt. 45, 253–260 (2021). https://doi.org/10.18287/2412-6179-co-801

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to A. V. Lapko, V. A. Lapko or V. P. Tuboltsev.

Ethics declarations

The authors declare that they have no conflicts of interest.

Additional information

Alexander Vasilievich Lapko. Born in 1949. In 1971 graduated from the Frunze Polytechnic Institute. Doctor of Engineering Sciences since 1990. Works at the Institute of Computational Modelling of the Siberian Branch of the Russian Academy of Sciences as a chief researcher.

Research interests: nonparametric statistics; pattern recognition systems; modeling and optimization of uncertain systems.

Author of 250 scientific articles, including 18 monographs.

Full member of the International Academy of Sciences of Higher Education, member of the editorial board of the scientific journal Computer Science and Control Systems, Honored Worker of Science of the Russian Federation, Honored Worker of Higher Professional Education of the Russian Federation.

Vasily Alexandrovich Lapko. Born in 1974. In 1996 graduated from the Krasnoyarsk State Technical University. Doctor of Engineering Sciences since 2004 in the specialty of System Analysis, Management, and Information Processing. Works at the Institute of Computational Modelling of the Siberian Branch of the Russian Academy of Sciences as a leading researcher. Head of the Department of Space Tools and Technologies of the Reshetnev Siberian State Technical University.

Research interests: nonparametric statistics; pattern recognition systems; modeling of uncertain systems; collective assessment methods.

Author of 205 scientific articles, including eight monographs.

Vitaly Pavlovich Tuboltsev. Born in 1998. In 2022 graduated from the Reshetnev Siberian State University of Science and Technology. Graduate student. Works as an engineer of the first category at the department of remote observations and geoinformation systems of the branch of FBI “RCFH” Centre of Forest Health of Krasnoyarsk oblast, Krasnoyarsk, Russia.

Research interests: development of information tools, nonparametric classification systems, fast decision rule optimization algorithms, remote sensing data processing.

Author of seven scientific articles.

Translated by L. A. Solovyova

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lapko, A.V., Lapko, V.A. & Tuboltsev, V.P. Nonparametric System for Automatic Classification of Large-Scale Statistical Data. Pattern Recognit. Image Anal. 33, 576–583 (2023). https://doi.org/10.1134/S1054661823030252

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1054661823030252

Keywords:

Navigation