Incremental Weighted Naive Bays Classifiers for Data Stream

  • Christophe SalperwyckEmail author
  • Vincent Lemaire
  • Carine Hue
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes’ theorem with naive independence assumption. The explanatory variables (X i ) are assumed to be independent from the target variable (Y ). Despite this strong assumption this classifier has proved to be very effective on many real applications and is often used on data stream for supervised classification. The naive Bayes classifier simply relies on the estimation of the univariate conditional probabilities P(X i  | C). This estimation can be provided on a data stream using a “supervised quantiles summary.” The literature shows that the naive Bayes classifier can be improved (1) using a variable selection method (2) weighting the explanatory variables. Most of these methods are related to batch (off-line) learning and need to store all the data in memory and/or require reading more than once each example. Therefore they cannot be used on data stream. This paper presents a new method based on a graphical model which computes the weights on the input variables using a stochastic estimation. The method is incremental and produces a Weighted Naive Bayes Classifier for data stream. This method will be compared to classical naive Bayes classifier on the Large Scale Learning challenge datasets.


Data Stream Graphical Model Supervise Classification Concept Drift Bayesian Model Average 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Bishop, C. M. (1995). Neural networks for pattern recognition. New York: Oxford University Press.Google Scholar
  2. Boullé, M. (2006a). MODL: A Bayes optimal discretization method for continuous attributes. Machine Learning, 65(1), 131–165.CrossRefGoogle Scholar
  3. Boullé, M. (2006b). Regularization and averaging of the selective naive bayes classifier. In The 2006 IEEE International Joint Conference on Neural Network Proceedings (pp. 1680–1688).Google Scholar
  4. Cotter, A., Shamir, O., Srebro, N., & Sridharan, K. (2011). Better mini-batch algorithms via accelerated gradient methods. In J. Shawe-taylor, R.S. Zemel, P. Bartlett, F. C. N. Pereira, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 24, pp. 1647–1655).
  5. Gama, J. (2010). Knowledge discovery from data streams. Chapman and Hall/CRC PressGoogle Scholar
  6. Gama, J., & Pinto, C. (2006). Discretization from data streams: applications to histograms and data mining. In Proceedings of the 2006 ACM Symposium on Applied Computing (pp. 662–667).Google Scholar
  7. Greenwald, M., & Khanna, S. (2001). Space-efficient online computation of quantile summaries. ACM SIGMOD Record, 30(2), 58–66.CrossRefGoogle Scholar
  8. Guigourès, R., & Boullé, M. (2011). Optimisation directe des poids de modèles dans un prédicteur Bayésien naif moyenné. In Extraction et gestion des connaissances EGC’2011 (pp. 77–82).Google Scholar
  9. Guyon, I., Lemaire, V., Boullé, M., Dror, G., & Vogel, D. (2009). Analysis of the KDD cup 2009: Fast scoring on a large orange customer database. In JMLR: Workshop and Conference Proceedings (Vol. 7, pp. 1–22).Google Scholar
  10. Hoeting, J., Madigan, D., & Raftery, A. (1999). Bayesian model averaging: a tutorial. Statistical Science, 14(4), 382–417.CrossRefzbMATHMathSciNetGoogle Scholar
  11. Koller, D., & Sahami, M. (1996, May). Toward optimal feature selection. In International Conference on Machine Learning (pp. 284–292).Google Scholar
  12. Kuncheva, L. I., & Plumpton, C. O. (2008). Adaptive learning rate for online linear discriminant classifiers. In Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition (pp. 510–519). Heidelberg: Springer.Google Scholar
  13. Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifiers. In Proceedings of the National Conference on Artificial Intelligence (pp. 223–228).Google Scholar
  14. Langley, P., & Sage, S. (1994). Induction of selective Bayesian classifiers. In R. L. Mantaras & D. Poole (Eds.), Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence (pp. 399–406). Seattle, WA: Morgan Kaufmann.Google Scholar
  15. Lecun, Y., Bottou, L., Orr, G. B., & Müller, K. R. (1998). Efficient BackProp. In G. Orr & K. Müller (Eds.), Neural networks: Tricks of the trade. Lecture notes in computer science (Vol. 1524, pp. 5–50). Heidelberg: Springer.Google Scholar
  16. Salperwyck, C. (2012). Apprentissage incrémental en ligne sur flux de données. PhD thesis, University of Lille.Google Scholar
  17. Salperwyck, C., & Lemaire, V. (2013). A two layers incremental discretization based on order statistics. In Statistical models for data analysis (pp. 315–323). Springer International Publishing.

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Christophe Salperwyck
    • 1
    Email author
  • Vincent Lemaire
    • 1
  • Carine Hue
    • 1
  1. 1.Orange LabsLannionFrance

Personalised recommendations