Preconditioning an Artificial Neural Network Using Naive Bayes

  • Nayyar A. Zaidi
  • François Petitjean
  • Geoffrey I. Webb
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9651)

Abstract

Logistic Regression (LR) is a workhorse of the statistics community and a state-of-the-art machine learning classifier. It learns a linear model from inputs to outputs trained by optimizing the Conditional Log-Likelihood (CLL) of the data. Recently, it has been shown that preconditioning LR using a Naive Bayes (NB) model speeds up LR learning many-fold. One can, however, train a linear model by optimizing the mean-square-error (MSE) instead of CLL. This leads to an Artificial Neural Network (ANN) with no hidden layer. In this work, we study the effect of NB preconditioning on such an ANN classifier. Optimizing MSE instead of CLL may lead to a lower bias classifier and hence result in better performance on big datasets. We show that this NB preconditioning can speed-up convergence significantly. We also show that optimizing a linear model with MSE leads to a lower bias classifier than optimizing with CLL. We also compare the performance to state-of-the-art classifier Random Forest.

Keywords

Logistic regression Preconditioning Conditional log-likelihood Mean-square-error WANBIA-C Artificial neural networks 

References

  1. 1.
    Duda, R., Hart, P., Stork, D.: Pattern Classification. John Wiley and Sons, New York (2006)MATHGoogle Scholar
  2. 2.
    Minka, T.P.: A comparison of numerical optimizers for logistic regression (2003)Google Scholar
  3. 3.
    Zaidi, N.A., Cerquides, J., Carman, M.J., Webb, G.I.: Alleviating naive Bayes attribute independence assumption by attribute weighting. J. Mach. Learn. Res. 14, 1947–1988 (2013)MathSciNetMATHGoogle Scholar
  4. 4.
    Zaidi, N.A., Carman, M.J., Cerquides, J., Webb, G.I.: Naive-bayes inspired effective pre-conditioners for speeding-up logistic regression. In: IEEE International Conference on Data Mining (2014)Google Scholar
  5. 5.
    Martinez, A., Chen, S., Webb, G.I., Zaidi, N.A.: Scalable learning of Bayesian network classifiers. J. Mach. Learn. Res. (2015) (in press)Google Scholar
  6. 6.
    Zaidi, N.A., Webb, G.I., Carman, M.J., Petitjean, F.: Deep broad learning - Big models for Big data (2015). arxiv:1509.01346
  7. 7.
    Kohavi, R., Wolpert, D.: Bias plus variance decomposition for zero-one loss functions. In: ICML, pp. 275–283 (1996)Google Scholar
  8. 8.
    Webb, G.I.: Multiboosting: A technique for combining boosting and wagging. Mach. Learn. 40(2), 159–196 (2000)CrossRefGoogle Scholar
  9. 9.
    Brain, D., Webb, G.I.: The need for low bias algorithms in classification learning from small data sets. In: PKDD, pp. 62–73 (2002)Google Scholar
  10. 10.
    Fayyad, U.M., Irani, K.B.: On the handling of continuous-valued attributes in decision tree generation. Mach. Learn. 8(1), 87–102 (1992)MATHGoogle Scholar
  11. 11.
    Zhu, C., Byrd, R.H., Nocedal, J.: LBFGSB, fortran routines for large scale bound constrained optimization. ACM Trans. Math. Softw. 23(4), 550–560 (1997)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Byrd, R., Lu, P., Nocedal, J.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Stat. Comput. 16(5), 1190–1208 (1995)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)CrossRefMATHGoogle Scholar
  14. 14.
    Brain, D., Webb, G.: On the effect of data set size on bias and variance in classification learning. In: Proceedings of the Fourth Australian Knowledge Acquisition Workshop, pp. 117–128. University of New South Wales (1999)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Nayyar A. Zaidi
    • 1
  • François Petitjean
    • 1
  • Geoffrey I. Webb
    • 1
  1. 1.Faculty of Information TechnologyMonash UniversityMelbourneAustralia

Personalised recommendations