Highly Scalable Attribute Selection for Averaged One-Dependence Estimators

  • Shenglei Chen
  • Ana M. Martinez
  • Geoffrey I. Webb
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8444)


Averaged One-Dependence Estimators (AODE) is a popular and effective approach to Bayesian learning. In this paper, a new attribute selection approach is proposed for AODE. It can search in a large model space, while it requires only a single extra pass through the training data, resulting in a computationally efficient two-pass learning algorithm. The experimental results indicate that the new technique significantly reduces AODE’s bias at the cost of a modest increase in training time. Its low bias and computational efficiency make it an attractive algorithm for learning from big data.


Classification Naive Bayes AODE Semi-naive Bayes Attribute Selection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis, 1st edn. John Wiley & Sons Inc. (1973)Google Scholar
  2. 2.
    Webb, G.I., Boughton, J.R., Wang, Z.: Not so naive Bayes: Aggregating one-dependence estimators. Machine Learning 58(1), 5–24 (2005)CrossRefzbMATHGoogle Scholar
  3. 3.
    Zheng, F., Webb, G.I.: A comparative study of semi-naive Bayes methods in classification learning. In: AusDM, pp. 141–156 (2005)Google Scholar
  4. 4.
    Yang, Y., Webb, G.I., Cerquides, J., Korb, K.B., Boughton, J., Ting, K.M.: To select or to weigh: A comparative study of linear combination schemes for superparent-one-dependence estimators. IEEE Transactions on Knowledge and Data Engineering 19(12), 1652–1665 (2007)CrossRefGoogle Scholar
  5. 5.
    Zheng, F., Webb, G.I.: Finding the right family: Parent and child selection for averaged one-dependence estimators. In: Kok, J.N., Koronacki, J., de Lopez Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 490–501. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  6. 6.
    Webb, G.I., Boughton, J.R., Zheng, F., Ting, K.M., Salem, H.: Learning by extrapolation from marginal to full-multivariate probability distributions: Decreasingly naive Bayesian classification. Machine Learning 86(2), 233–272 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Cerquides, J., de Mántaras, R.L.: Robust Bayesian linear classifier ensembles. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 72–83. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. 8.
    Jiang, L., Zhang, H.: Weightily averaged one-dependence estimators. In: Yang, Q., Webb, G. (eds.) PRICAI 2006. LNCS (LNAI), vol. 4099, pp. 970–974. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Zheng, F., Webb, G.I., Suraweera, P., Zhu, L.: Subsumption resolution: An efficient and effective technique for semi-naive Bayesian learning. Machine Learning 87(1), 93–125 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  10. 10.
    Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proceedings of the Tenth International Conference on Uncertainty in Artificial Intelligence, pp. 399–406. Morgan Kaufmann Publishers Inc. (1994)Google Scholar
  11. 11.
    Kittler, J.: Feature selection and extraction. In: Handbook of Pattern Recognition and Image Processing, pp. 59–83 (1986)Google Scholar
  12. 12.
    MacKay, D.J.: Information theory, inference and learning algorithms. Cambridge university press (2003)Google Scholar
  13. 13.
    Kohavi, R.: The power of decision tables. In: Lavrač, N., Wrobel, S. (eds.) ECML 1995. LNCS, vol. 912, pp. 174–189. Springer, Heidelberg (1995)CrossRefGoogle Scholar
  14. 14.
    Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: IJCAI, pp. 1022–1027 (1993)Google Scholar
  15. 15.
    Cestnik, B.: Estimating probabilities: A crucial task in machine learning. In: ECAI, vol. 90, pp. 147–149 (1990)Google Scholar
  16. 16.
    Bache, K., Lichman, M.: UCI machine learning repository (2013)Google Scholar
  17. 17.
    Kohavi, R., Wolpert, D.H.: Bias plus variance decomposition for zero-one loss functions. In: ICML, pp. 275–283 (1996)Google Scholar
  18. 18.
    Brain, D., Webb, G.I.: The need for low bias algorithms in classification learning from large data sets. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 62–73. Springer, Heidelberg (2002)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Shenglei Chen
    • 1
    • 2
  • Ana M. Martinez
    • 2
  • Geoffrey I. Webb
    • 2
  1. 1.College of Information ScienceNanjing Audit UniversityNanjingChina
  2. 2.Faculty of Information TechnologyMonash UniversityAustralia

Personalised recommendations