Outliers Detection in Regressions by Nonparametric Parzen Kernel Estimation

  • Tomasz GalkowskiEmail author
  • Andrzej Cader
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10842)


A certain observation which is unusual or different from all other ones is called the outlier or anomaly. Appropriate evaluation of data is a crucial problem in modelling of the real objects or phenomena. Actually investigated problems often are based on data mass-produced by computer systems, without careful inspection or screening. The great amount of generated and processed information (e.g. so-called Big-Data) cause that possible outliers often go unnoticed and the result is that they can be masked. However, in regression, this situation can be more complicated. The identification and evaluation of the extremely atypical measurements in observations, for instance in some areas of medicine, geology, particularly in seismology (earthquakes), is precisely the outliers that are the subjects of interest. In this paper, a nonparametric procedure based on Parzen kernel for estimation of unknown function is applied. Evaluation of which measurements in input data-set could be recognized as outliers and possibly should be removed has been performed using the Cook’s Distance formula. Anomaly detection is still an important problem to be researched within diverse areas and application domains.


Outlier detection Regression Nonparametric estimation 


  1. 1.
    Andersen, R.: Modern Methods for Robust Regression. Quantitative Applications in the Social Sciences, vol. 152. Sage, Thousand Oaks (2008)CrossRefGoogle Scholar
  2. 2.
    Beg, I., Rashid, T.: Modelling uncertainties in multi-criteria decision making using distance measure and topsis for hesitant fuzzy sets. J. Artif. Intell. Soft Comput. Res. 7(2), 103–109 (2017)CrossRefGoogle Scholar
  3. 3.
    Bollen K.A., Jackman R.W.: Regression diagnostics: an expository treatment of outliers and influential cases. In: Fox, J., Scott, L.J. (eds.) Modern Methods of Data Analysis, pp. 257–291. Sage, Newbury Park (1990). ISBN 0-8039-3366-5Google Scholar
  4. 4.
    Cook, R.D.: Detection of influential observations in linear regression. Technometrics 19, 15–18 (1977). American Statistical AssociationMathSciNetzbMATHGoogle Scholar
  5. 5.
    Cook, R.D.: Residuals and Influence in Regression. Weisberg, Sanford, New York (1982)Google Scholar
  6. 6.
    Chandola, V., Banerjee A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), Article 15, 58 p. Chapman and Hall (2009). ISBN 0-412-24280-XCrossRefGoogle Scholar
  7. 7.
    Cpalka, K., Rebrova, O., Nowicki, R., et al.: On design of flexible neuro-fuzzy systems for nonlinear modelling. Int. J. Gen. Syst. 42(6), 706–720 (2013)CrossRefGoogle Scholar
  8. 8.
    Cpałka, K., Łapa, K., Przybył, A.: A new approach to design of control systems using genetic programming. Inf. Technol. Control 44(4), 433–442 (2015)Google Scholar
  9. 9.
    Duch, W., Korbicz, J., Rutkowski, L., Tadeusiewicz, R. (eds.): Biocybernetics and Biomedical Engineering 2000. Neural Networks, vol. 6. Akademicka Oficyna Wydawnicza, EXIT, Warsaw (2000). (in Polish)Google Scholar
  10. 10.
    Galkowski, T., Rutkowski, L.: Nonparametric recovery of multivariate functions with applications to system identification. In: Proceedings of the IEEE, vol. 73, pp. 942–943, New York (1985)Google Scholar
  11. 11.
    Galkowski, T., Rutkowski, L.: Nonparametric fitting of multivariable functions. IEEE Trans. Autom. Control AC–31, 785–787 (1986)CrossRefGoogle Scholar
  12. 12.
    Galkowski, T.: Nonparametric estimation of boundary values of functions. Arch. Control Sci. 3(1–2), 85–93 (1994)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Gałkowski, T.: Kernel estimation of regression functions in the boundary regions. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2013. LNCS (LNAI), vol. 7895, pp. 158–166. Springer, Heidelberg (2013). Scholar
  14. 14.
    Galkowski, T., Pawlak, M.: Nonparametric extension of regression functions outside domain. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2014. LNCS (LNAI), vol. 8467, pp. 518–530. Springer, Cham (2014). Scholar
  15. 15.
    Galkowski, T., Pawlak, M.: Orthogonal series estimation of regression functions in nonstationary conditions. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2015. LNCS (LNAI), vol. 9119, pp. 427–435. Springer, Cham (2015). Scholar
  16. 16.
    Galkowski, T., Pawlak, M.: Nonparametric estimation of edge values of regression functions. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2016. LNCS (LNAI), vol. 9693, pp. 49–59. Springer, Cham (2016). Scholar
  17. 17.
    Galkowski, T., Pawlak, M.: The novel method of the estimation of the Fourier transform based on noisy measurements. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2017. LNCS (LNAI), vol. 10246, pp. 52–61. Springer, Cham (2017). Scholar
  18. 18.
    Gasser, T., Müller, H.-G.: Kernel estimation of regression functions. In: Gasser, T., Rosenblatt, M. (eds.) Smoothing Techniques for Curve Estimation. LNM, vol. 757, pp. 23–68. Springer, Heidelberg (1979). Scholar
  19. 19.
    Goldberger, A.L., Amaral, L.A.N., Glass, L., Hausdorff, J.M., Ivanov, P.C., Mark, R.G., Mietus, J.E., Moody, G.B., Peng, C.-K., Stanley, H.E.: Components of a new research resource for complex physiologic signals, PhysioBank, PhysioToolkit, and PhysioNet. Circulation 101(23), 215–220 (2000)CrossRefGoogle Scholar
  20. 20.
    Greblicki, W., Rutkowski, L.: Density-free Bayes risk consistency of nonparametric pattern recognition procedures. Proc. IEEE 69(4), 482–483 (1981)CrossRefGoogle Scholar
  21. 21.
    Grycuk, R., Gabryel, M., Nowicki, R., Scherer, R.: Content-based image retrieval optimization by differential evolution. In: 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 86–93 (2016)Google Scholar
  22. 22.
    Grycuk, R., Scherer, R., Gabryel, M.: New image descriptor from edge detector and blob extractor. J. Appl. Math. Comput. Mech. 14(4), 31–39 (2015)CrossRefGoogle Scholar
  23. 23.
    Korytkowski, M., Rutkowski, L., Scherer, R.: On combining backpropagation with boosting. In: International Joint Conference on Neural Networks, pp. 1274–1277 (2006)Google Scholar
  24. 24.
    Zhang, L., Lin, J., Karim, R.: Adaptive kernel density-based anomaly detection for nonlinear systems. Knowl.-Based Syst. 139, 50–63 (2018)CrossRefGoogle Scholar
  25. 25.
    Liu, H., Gegov, A., Cocea, M.: Rule based networks: an efficient and interpretable representation of computational models. J. Artif. Intell. Soft Comput. Res. 7(2), 111–123 (2017)CrossRefGoogle Scholar
  26. 26.
    Parzen, E.: On estimation of a probability density function and mode. Anal. Math. Stat. 33(3), 1065–1076 (1962)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Rotar, C., Iantovics, L.B.: Directed evolution - a new metaheuristc for optimization. J. Artif. Intell. Soft Comput. Res. 7(3), 183–200 (2017)CrossRefGoogle Scholar
  28. 28.
    Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley, Hoboken (2003)zbMATHGoogle Scholar
  29. 29.
    Rutkowski, L.: A general approach for nonparametric fitting of functions and their derivatives with applications to linear circuits identification. IEEE Trans. Circuits Syst. 33(8), 812–818 (1986)CrossRefGoogle Scholar
  30. 30.
    Rutkowski, L.: Sequential pattern recognition procedures derived from multiple Fourier series. Pattern Recognit. Lett. 8, 213–216 (1988)CrossRefGoogle Scholar
  31. 31.
    Rutkowski, L.: Non-parametric learning algorithms in the time-varying environments. Sig. Process. 18(2), 129–137 (1989)CrossRefGoogle Scholar
  32. 32.
    Rutkowski, L.: Multiple Fourier series procedures for extraction of nonlinear regressions from noisy data. IEEE Trans. Sig. Process. 41(10), 3062–3065 (1993)CrossRefGoogle Scholar
  33. 33.
    Rutkowski, L., Cpalka, K.: Compromise approach to neuro-fuzzy systems. In: Intelligent Technologies-Theory and Applications, 2nd Euro-International Symposium on Computation Intelligence, Kosice, Slovakia. Frontiers in Artificial Intelligence and Applications, vol. 76, pp. 85–90 (2002)Google Scholar
  34. 34.
    Starczewski, A.: A new validity index for crisp clusters. Pattern Anal. App. 20(3), 687–700 (2017)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Starczewski, A., Krzyżak, A.: Improvement of the validity index for determination of an appropriate data partitioning. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2017. LNCS (LNAI), vol. 10246, pp. 159–170. Springer, Cham (2017). Scholar
  36. 36.
    Tezuka, T., Claramunt, C.: Kernel analysis for estimating the connectivity of a network with event sequences. J. Artif. Intell. Soft Comput. Res. 7(1), 17–31 (2017)CrossRefGoogle Scholar
  37. 37.
    Yan, P.: Mapreduce and semantics enabled event detection using social media. J. Artif. Intell. Soft Comput. Res. 7(3), 201–213 (2017)CrossRefGoogle Scholar
  38. 38.
    Łapa, K., Cpałka, K., Wang, L.: New method for design of fuzzy systems for nonlinear modelling using different criteria of interpretability. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2014. LNCS (LNAI), vol. 8467, pp. 217–232. Springer, Cham (2014). Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Institute of Computational IntelligenceCzestochowa University of TechnologyCzestochowaPoland
  2. 2.Information Technology InstituteUniversity of Social SciencesLodzPoland
  3. 3.Clark UniversityWorcesterUSA

Personalised recommendations