Advertisement

To Select or To Weigh: A Comparative Study of Model Selection and Model Weighing for SPODE Ensembles

  • Ying Yang
  • Geoff Webb
  • Jesús Cerquides
  • Kevin Korb
  • Janice Boughton
  • Kai Ming Ting
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)

Abstract

An ensemble of Super-Parent-One-Dependence Estimators (SPODEs) offers a powerful yet simple alternative to naive Bayes classifiers, achieving significantly higher classification accuracy at a moderate cost in classification efficiency. Currently there exist two families of methodologies that ensemble candidate SPODEs for classification. One is to select only helpful SPODEs and uniformly average their probability estimates, a type of model selection. Another is to assign a weight to each SPODE and linearly combine their probability estimates, a methodology named model weighing. This paper presents a theoretical and empirical study comparing model selection and model weighing for ensembling SPODEs. The focus is on maximizing the ensemble’s classification accuracy while minimizing its computational time. A number of representative selection and weighing schemes are studied, providing a comprehensive research on this topic and identifying effective schemes that provide alternative trades-off between speed and expected error.

Keywords

Cross Validation Bayesian Information Criterion Class Label Training Instance Bayesian Model Average 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29(2), 131–163 (1997)MATHCrossRefGoogle Scholar
  2. 2.
    Keogh, E.J., Pazzani, M.J.: Learning augmented Bayesian classifers: A comparison of distribution-based and classification-based approaches. In: Proceedings of the International Workshop on Artificial Intelligence and Statistics, pp. 225–230 (1999)Google Scholar
  3. 3.
    Keogh, E.J., Pazzani, M.J.: Learning the structure of augmented Bayesian classifiers. International Journal on Artificial Intelligence Tools 11(40), 587–601 (2002)CrossRefGoogle Scholar
  4. 4.
    Kittler, J.: Feature selection and extraction. In: Young, T.Y., Fu, K.S. (eds.) Handbook of Pattern Recognition and Image Processing, New York (1986)Google Scholar
  5. 5.
    Kohavi, R.: Scaling up the accuracy of naive-Bayes classifiers: a decision-tree hybrid. In: Proceedings of the 2nd SIGKDD, pp. 202–207 (1996)Google Scholar
  6. 6.
    Kononenko, I.: Semi-naive Bayesian classifier. In: Proceedings of the 6th European Working Session on Machine learning, pp. 206–219 (1991)Google Scholar
  7. 7.
    Langley, P.: Induction of recursive Bayesian classifiers. In: Proceedings of the 4th ECML, pp. 153–164 (1993)Google Scholar
  8. 8.
    Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proceedings of the 10th UAI, pp. 399–406 (1994)Google Scholar
  9. 9.
    Pazzani, M.J.: Constructive induction of Cartesian product attributes. ISIS: Information, Statistics and Induction in Science, 66–77 (1996)Google Scholar
  10. 10.
    Sahami, M.: Learning limited dependence Bayesian classifiers. In: Proceedings of the 2nd SIGKDD, pp. 334–338 (1996)Google Scholar
  11. 11.
    Singh, M., Provan, G.M.: Efficient learning of selective Bayesian network classifiers. In: Proceedings of the 13th ICML, pp. 453–461 (1996)Google Scholar
  12. 12.
    Webb, G.I.: Candidate elimination criteria for lazy Bayesian rules. In: Proceedings of the 14th Australian AI, pp. 545–556 (2001)Google Scholar
  13. 13.
    Webb, G.I., Boughton, J., Wang, Z.: Not so naive Bayes: Aggregating one-dependence estimators. Machine Learning 58(1), 5–24 (2005)MATHCrossRefGoogle Scholar
  14. 14.
    Webb, G.I., Pazzani, M.J.: Adjusted probability naive Bayesian induction. In: Proceedings of the 11th Australian AI, pp. 285–295 (1998)Google Scholar
  15. 15.
    Xie, Z., Hsu, W., Liu, Z., Lee, M.L.: Snnb: A selective neighborhood based naive Bayes for lazy learning. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS, vol. 2336, p. 104. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  16. 16.
    Zheng, Z., Webb, G.I.: Lazy learning of Bayesian rules. Machine Learning 41(1), 53–84 (2000)CrossRefGoogle Scholar
  17. 17.
    Zheng, Z., Webb, G.I., Ting, K.M.: Lazy Bayesian rules: A lazy semi-naive Bayesian learning technique competitive to boosting decision trees. In: Proceedings of the 16th ICML, pp. 493–502 (1999)Google Scholar
  18. 18.
    Cerquides, J., de Mántaras, R.L.: Robust bayesian linear classifier ensembles. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS, vol. 3720, pp. 72–83. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  19. 19.
    De Ferrari, L.: Mining housekeeping genes with a naive Bayes classifier. MSc Thesis, University of Edinburgh, School of Informatics (2005)Google Scholar
  20. 20.
    Flikka, K., Martens, L., Vandekerckhove, J., Gevaert, K., Eidhammeri, I.: Improving throughput and reliability of peptide identifications through spectrum quality evaluation. In: Proceedings of the 9th Annual International Conference on Research in Computational Molecular Biology (2005)Google Scholar
  21. 21.
    Nikora, A.P.: Classifying requirements: Towards a more rigorous analysis of natural-language specifications. In: Proceedings of the 16th IEEE International Symposium on Software Reliability Engineering, pp. 291–300 (2005)Google Scholar
  22. 22.
    Yang, Y., Korb, K., Ting, K.M., Webb, G.I.: Ensemble selection for superparent-one-dependence estimators. In: Proceedings of the 18th Australian AI, pp. 102–112 (2005)Google Scholar
  23. 23.
    Zheng, F., Webb, G.I.: Efficient lazy elimination for averaged one-dependence estimators. In: Proceedings of the 23rd ICML (2006)Google Scholar
  24. 24.
    Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27(3), 379–423 (1948)MATHMathSciNetGoogle Scholar
  25. 25.
    Schwarz, G.: Estimating the dimension of a model. Annals of Statistics 6, 461–465 (1978)MATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    Korb, K., Nicholson, A.: Bayesian Artificial Intelligence. Chapman & Hall/CRC, Boca Raton (2004)MATHGoogle Scholar
  27. 27.
    Hoeting, J.A., Madigan, D., Raftery, A.E., Volinsky, C.T.: Bayesian model averaging: A tutorial. Statistical Science 14(4), 382–417 (1999)MATHCrossRefMathSciNetGoogle Scholar
  28. 28.
    Cooper, G.F., Herskovits, E.: A Bayesian method for constructing Bayesian belief networks from databases. In: Proceedings of the 7th UAI, pp. 86–94 (1991)Google Scholar
  29. 29.
    Pedregal, P.: Introduction to Optimization. Texts in Applied Mathematics, vol. 46. Springer, Heidelberg (2004)MATHGoogle Scholar
  30. 30.
    Heath, M.T.: Scientific Computing: An Introductory Survey, 2nd edn. McGraw-Hill, New York (2002)Google Scholar
  31. 31.
    Blake, C.L., Merz, C.J.: UCI repository of machine learning databases, Department of Information and Computer Science, University of California, Irvine (1998), http://www.ics.uci.edu/~mlearn/mlrepository.html
  32. 32.
    Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th IJCAI, pp. 1022–1027 (1993)Google Scholar
  33. 33.
    Breiman, L.: Bias, variance and arcing classifiers, technical report 460, Statistics Department, University of California, Berkeley (1996)Google Scholar
  34. 34.
    Friedman, J.H.: On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery 1(1), 55–77 (1997)CrossRefGoogle Scholar
  35. 35.
    Kohavi, R., Wolpert, D.: Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the 13th ICML, pp. 275–283 (1996)Google Scholar
  36. 36.
    Kong, E.B., Dietterich, T.G.: Error-correcting output coding corrects bias and variance. In: Proceedings of the 12th ICML, pp. 313–321 (1995)Google Scholar
  37. 37.
    Webb, G.I.: Multiboosting: A technique for combining boosting and wagging. Machine Learning 40(2), 159–196 (2000)CrossRefGoogle Scholar
  38. 38.
    Moore, D.S., McCabe, G.P.: Introduction to the Practice of Statistics, 4th edn. Michelle Julet (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ying Yang
    • 1
  • Geoff Webb
    • 1
  • Jesús Cerquides
    • 2
  • Kevin Korb
    • 1
  • Janice Boughton
    • 1
  • Kai Ming Ting
    • 1
  1. 1.Clayton School of Information TechnologyMonash UniversityAustralia
  2. 2.Departament de Matemàtica Aplicada i AnàlisiUniversitat de BarcelonaSpain

Personalised recommendations