Advertisement

Methodically Unified Procedures for Outlier Detection, Clustering and Classification

  • Piotr KulczyckiEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1069)

Abstract

In the practice of data analysis some problems for many-sided researches are caused by the methodological variety of specific algorithms, often leading to laborious interpretations and time-consuming studies. This paper presents the concept of methodically unified procedures, based on kernel estimators, for three fundamental tasks: outlier detection, clustering, and classification. Their clear interpretation facilitates the applications and potential individual modifications. The investigated procedures are distribution-free, enabling analysis and exploration of data with any distributions, also when elements are grouped in several separated parts. The results obtained depend not only on the values of particular attributes, but above all on the complex relationships between them.

Keywords

Outlier detection Clustering Classification Distribution free methods Kernel estimators Numerical algorithm 

Notes

Acknowledgments

I would like to express my gratitude to my close associates – former Ph.D.-students – Małgorzata Charytanowicz, D.Sc., Karina Daniel, Ph.D., Piotr A. Kowalski, D.Sc., Damian Kruszewski, Ph.D., Szymon Łukasik, Ph.D., with whom the research summarized in this paper was conducted.

References

  1. 1.
    Aggarwal, C.C.: Outlier Analysis. Springer, New York (2013)CrossRefGoogle Scholar
  2. 2.
    Aggarwal, C.C.: Data Mining. Springer, Cham (2015)zbMATHGoogle Scholar
  3. 3.
    Agresti, A.: Categorical Data Analysis. Wiley, Hoboken (2002)CrossRefGoogle Scholar
  4. 4.
    Biau, G., Devroye, L.: Lectures on the Nearest Neighbor Method. Springer, Cham (2015)CrossRefGoogle Scholar
  5. 5.
    Billingsley, P.: Probability and Measure. Wiley, New York (1995)zbMATHGoogle Scholar
  6. 6.
    Canaan, C., Garai, M.S., Daya, M.: Popular sorting algorithms. World Appl. Program. 1, 62–71 (2011)Google Scholar
  7. 7.
    Duda, R.O., Hart, P.E., Storck, D.G.: Pattern Classification. Wiley, New York (2001)Google Scholar
  8. 8.
    Gentle, J.E.: Random Number Generation and Monte Carlo Methods. Springer, New York (2003)zbMATHGoogle Scholar
  9. 9.
    Fodor, J., Roubens, M.: Fuzzy Preference Modelling and Multicriteria Decision Support. Kluwer, Dordrecht (1994)CrossRefGoogle Scholar
  10. 10.
    Fukunaga, K., Hostetler, L.D.: The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inf. Theory 21, 32–40 (1975)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Kelley, C.T.: Iterative Methods for Optimization. SIAM, Philadelphia (1999)CrossRefGoogle Scholar
  12. 12.
    Kulczycki, P.: Wykrywanie uszkodzeń w systemach zautomatyzowanych metodami statystycznymi. Alfa, Warsaw (1998)Google Scholar
  13. 13.
    Kulczycki, P.: Estymatory jągonadrowe w analizie systemowej. WNT, Warsaw (2005)Google Scholar
  14. 14.
    Kulczycki, P., Kacprzyk, J., Kóczy, L.T., Mesiar, R., Wisniewski, R. (eds.): Information Technology, Systems Research, and Computational Physics. Springer, Cham (2020)Google Scholar
  15. 15.
    Kulczycki, P.: Kernel estimators for data analysis. In: Ram, M., Davim, J.P. (eds.) Advanced Mathematical Techniques in Engineering Sciences, pp. 177–202. CRC/Taylor & Francis, Boca Raton (2018)CrossRefGoogle Scholar
  16. 16.
    Kulczycki, P., Charytanowicz, M.: A complete gradient clustering algorithm formed with kernel estimators. Int. J. Appl. Math. Comput. Sci. 20, 123–134 (2010)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Kulczycki, P., Charytanowicz, M., Kowalski, P.A., Łukasik, S.: The complete gradient clustering algorithm: properties in practical applications. J. Appl. Stat. 39, 1211–1224 (2012)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Kulczycki, P., Daniel, K.: Metoda wspomagania strategii marketingowej operatora telefonii komórkowej, Przeglągonad Statystyczny, vol. 56, no. 2, pp. 116–134 (2009). Errata: vol. 56, no. 3–4, s. 3 (2009)Google Scholar
  19. 19.
    Kulczycki, P., Kowalski, P.A.: Bayes classification of imprecise information of interval type. Control Cybern. 40, 101–123 (2011)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Kulczycki, P., Kowalski, P.A.: Bayes classification for nonstationary patterns. Int. J. Comput. Methods 12(2), 19 (2015). ID 1550008MathSciNetCrossRefGoogle Scholar
  21. 21.
    Kulczycki, P., Kruszewski, D.: Identification of atypical elements by transforming task to supervised form with fuzzy and intuitionistic fuzzy evaluations. Appl. Soft Comput. 60, 623–633 (2017)CrossRefGoogle Scholar
  22. 22.
    Kulczycki, P., Kruszewski, D.: Detection of rare elements in investigation of medical problems. In: Nguyen, N.T., Gaol, G.L., Hong, T.-P., Trawiński, B. (eds.) Intelligent Information and Database Systems, pp. 257–268. Springer, Cham (2019)CrossRefGoogle Scholar
  23. 23.
    Kulczycki, P., Prochot, C.: Identyfikacja stanów nietypowych za pomocą estymatorów jądrowych. In: Bubnicki, Z., Hryniewicz, O., Kulikowski, R. (eds.) Metody i techniki analizy informacji i wspomagania decyzji, pp. 57–62. EXIT, Warsaw (2002)Google Scholar
  24. 24.
    Mirkin, B.: Clustering for Data Mining. Taylor & Francis, Boca Raton (2005)CrossRefGoogle Scholar
  25. 25.
    Parrish, R.: Comparison of quantile estimators in normal sampling. Biometrics 46, 247–257 (1990)CrossRefGoogle Scholar
  26. 26.
    Rokach, L., Maimon, O.: Data Mining with Decision Trees. World Scientific, New Jersey (2015)zbMATHGoogle Scholar
  27. 27.
    Silva, J., Faria, E., Barros, R., Hruschka, E., de Carvalho, A., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. 46, 13 (2013)CrossRefGoogle Scholar
  28. 28.
    Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)CrossRefGoogle Scholar
  29. 29.
    Sorzano, C.O.S., Vargas, J., Pascual-Montano, A.: A survey of dimensionality reduction techniques, arXiv, signature 1403.2877v1 (2014)Google Scholar
  30. 30.
    Wand, M., Jones, M.: Kernel Smoothing. Chapman and Hall, London (1995)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Systems Research Institute, Centre of Information Technology for Data Analysis MethodsPolish Academy of SciencesKrakówPoland
  2. 2.Faculty of Physics and Applied Computer Science, Division for Information Technology and Systems ResearchAGH University of Science and TechnologyKrakówPoland

Personalised recommendations