Influence of Outliers Introduction on Predictive Models Quality

  • Mateusz Kalisch
  • Marcin Michalak
  • Marek Sikora
  • Łukasz Wróbel
  • Piotr Przystałka
Conference paper

DOI: 10.1007/978-3-319-34099-9_5

Part of the Communications in Computer and Information Science book series (CCIS, volume 613)
Cite this paper as:
Kalisch M., Michalak M., Sikora M., Wróbel Ł., Przystałka P. (2016) Influence of Outliers Introduction on Predictive Models Quality. In: Kozielski S., Mrozek D., Kasprowski P., Małysiak-Mrozek B., Kostrzewa D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS 2015, BDAS 2016. Communications in Computer and Information Science, vol 613. Springer, Cham

Abstract

The paper presents results of the research related to influence of the level of outliers in the data (train and test data considered separately) on the quality of a model prediction in a classification task. The set of 100 semi–artificial time series was taken into consideration, which independent variables was close to real ones, observed in a underground coal mining environment and dependent variable was generated with the decision tree. For every considered method (decision trees, naive bayes, logistic regression and kNN) a reference model was built (no outliers in the data) which quality was compared with the quality of two models: Out–Out (outliers in train and test data) and Non-out–Out (outliers only in test data). 50 levels of outliers in the data were considered, from 1 % to 50 %. Statistical comparison of models was done on the basis of sign test.

Keywords

Data analysis Classification Outlier detection Time series 

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Mateusz Kalisch
    • 1
  • Marcin Michalak
    • 2
    • 3
  • Marek Sikora
    • 2
    • 3
  • Łukasz Wróbel
    • 3
  • Piotr Przystałka
    • 1
  1. 1.Institute of Fundamentals of Machinery DesignSilesian University of TechnologyGliwicePoland
  2. 2.Institute of InformaticsSilesian University of TechnologyGliwicePoland
  3. 3.Institute of Innovative Technologies EMAGKatowicePoland

Personalised recommendations