Advertisement

Concept Drift Detection Using Online Histogram-Based Bayesian Classifiers

  • César A. Astudillo
  • Javier I. González
  • B. John OommenEmail author
  • Anis Yazidi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9992)

Abstract

In this paper, we present a novel algorithm that performs online histogram-based classification, i.e., specifically designed for the case when the data is dynamic and its distribution is non-stationary. Our method, called the Online Histogram-based Naïve Bayes Classifier (OHNBC) involves a statistical classifier based on the well-established Bayesian theory, but which makes some assumptions with respect to the independence of the attributes. Moreover, this classifier generates a prediction model using uni-dimensional histograms, whose segments or buckets are fixed in terms of their cardinalities but dynamic in terms of their widths. Additionally, our algorithm invokes the principles of information theory to automatically identify changes in the performance of the classifier, and consequently, forces the reconstruction of the classification model in run-time as and when it is needed. These properties have been confirmed experimentally over numerous data sets (In the interest of space and brevity, we present here only a subset of the available results. More detailed results are found in [2].) from different domains. As far as we know, our histogram-based Naïve Bayes classification paradigm for time-varying datasets is both novel and of a pioneering sort.

Keywords

Online Naïve Bayes Classifier Online learning Concept drift Dynamic Histograms 

References

  1. 1.
    Abdulsalam, H., Skillicorn, D., Martin, P.: Classification using streaming random forests. IEEE Trans. Knowl. Data Eng. 23(1), 22–36 (2011)CrossRefGoogle Scholar
  2. 2.
    Astudillo, C.A., Gonzalez, J., Oommen, B.J., Yazidi, A.: Concept drift detection using classifiers that are online, Bayesian and histogram-based. Unabridged Version of this paper. (In Preparation)Google Scholar
  3. 3.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    García-Laencina, P., Sancho-Gómez, J., Figueiras-Vidal, A.: Pattern classification with missing data: a review. Neural Comput. Appl. 19, 263–282 (2010). doi: 10.1007/s00521-009-0295-6 CrossRefGoogle Scholar
  5. 5.
    Last, M.: Online classification of nonstationary data streams. Intell. Data Anal. 6(2), 129–147 (2002)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Littlestone, N.: Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Mach. Learn. 2(4), 285–318 (1988)Google Scholar
  7. 7.
    Saffari, A., Leistner, C., Santner, J., Godec, M., Bischof, H.: On-line random forests. In: 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1393–1400, October 2009Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • César A. Astudillo
    • 1
  • Javier I. González
    • 1
  • B. John Oommen
    • 2
    Email author
  • Anis Yazidi
    • 3
  1. 1.Department of Computer ScienceUniversidad de TalcaCuricóChile
  2. 2.School of Computer ScienceCarleton UniversityOttawaCanada
  3. 3.Department of Computer ScienceOslo and Akershus University College of Applied SciencesOsloNorway

Personalised recommendations