Skip to main content
Log in

Selective AnDE for large data learning: a low-bias memory constrained approach

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Learning from data that are too big to fit into memory poses great challenges to currently available learning approaches. Averaged n-Dependence Estimators (AnDE) allows for a flexible learning from out-of-core data, by varying the value of n (number of super parents). Hence, AnDE is especially appropriate for learning from large quantities of data. Memory requirement in AnDE, however, increases combinatorially with the number of attributes and the parameter n. In large data learning, number of attributes is often large and we also expect high n to achieve low-bias classification. In order to achieve the lower bias of AnDE with higher n but with less memory requirement, we propose a memory constrained selective AnDE algorithm, in which two passes of learning through training examples are involved. The first pass performs attribute selection on super parents according to available memory, whereas the second one learns an AnDE model with parents only on the selected attributes. Extensive experiments show that the new selective AnDE has considerably lower bias and prediction error relative to A\(n'\)DE, where \(n' = n-1\), while maintaining the same space complexity and similar time complexity. The proposed algorithm works well on categorical data. Numerical data sets need to be discretized first.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  2. Brain D, Webb GI (2002) The need for low bias algorithms in classification learning from large data sets. In: Elomaa T, Mannila H, Toivonen H (eds) Proceedings of the 6th European Conference on Principles of data mining and knowledge discovery. Springer, pp 62–73

  3. Cestnik B (1990) Estimating probabilities: a crucial task in machine learning. ECAI 90:147–149

    Google Scholar 

  4. Chen S, Martinez AM, Webb GI (2014) Highly scalable attribute selection for averaged one-dependence estimators. In: Proceedings of the 18th Pacific-Asia conference on knowledge discovery and data mining, pp 86–97. Springer

  5. Domingos P, Pazzani M (1996) Beyond independence: conditions for the optimality of the simple Bayesian classifier. In: Proceedings of 13th international conference on machine learning, pp 105–112

  6. Duda RO, Hart PE (1973) Pattern classification and scene analysis, 1st edn. Wiley, New York

    MATH  Google Scholar 

  7. Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th international joint conference on artificial intelligence, pp 1022–1029

  8. Flores M, Gmez J, Martnez A, Puerta J (2011) Handling numeric attributes when comparing bayesian network classifiers: does the discretization method matter? Appl Intell 34(3):372–385

    Article  Google Scholar 

  9. Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163

    Article  MATH  Google Scholar 

  10. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11(1):10–18. doi:10.1145/1656274.1656278

    Article  Google Scholar 

  11. Schmidtmann I, Hammer G, Sariyar M, Gerhold-Ay A (2009) Evaluation des krebsregisters nrw—schwerpunkt record linkage—abschlussbericht. Tech. rep., Institut für medizinische Biometrie, Epidemiologie und Informatik, Universitätsmedizin Mainz

  12. Jiang L, Zhang H (2006) Weightily averaged one-dependence estimators. In: PRICAI 2006: trends in artificial intelligence, pp 970–974. Springer

  13. Kaluža B, Mirchevska V, Dovgan E, Luštrek M, Gams M (2010) An agent-based approach to care in independent living. In: Proceedings of the first international joint conference on ambient intelligence. Am I’10, Springer, Berlin, pp 177–186

  14. Kohavi R, Wolpert DH (1996) Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the thirteenth international conference on machine learning, pp 275–283. Morgan Kaufman Publishers, Inc

  15. MacKay DJ (2003) Information theory, inference and learning algorithms. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  16. Petitjean F, Inglada J, Gançarski P (2012) Satellite image time series analysis under time warping. IEEE Trans Geosci Remote Sens 50(8):3081–3095

    Article  Google Scholar 

  17. Reiss A, Stricker D (2012) Creating and benchmarking a new dataset for physical activity monitoring. In: Proceedings of the 5th international conference on PErvasive Technologies Related to Assistive Environments, PETRA ’12, pp 40:1–40:8. ACM, New York, NY, USA

  18. Rish I (2001) An empirical study of the naive bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 3, pp 41–46

  19. Sahami M (1996) Learning limited dependence Bayesian classifiers. In: Proceedings of the second international conference on knowledge discovery and data mining, pp 335–338

  20. Sonnenburg S, Franc V (2010) COFFIN: a computational framework for linear SVMs. In: Proc. ICML 2010

  21. Tsang IW, Kwok JT, Cheung PM (2005) Core vector machines: fast SVM training on very large data sets. J Mach Learn Res 6:363–392

    MathSciNet  MATH  Google Scholar 

  22. Webb GI, Boughton JR, Wang Z (2005) Not so naive bayes: aggregating one-dependence estimators. Mach Learn 58(1):5–24

    Article  MATH  Google Scholar 

  23. Webb GI, Boughton JR, Zheng F, Ting KM, Salem H (2012) Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification. Mach Learn 86(2):233–272

    Article  MathSciNet  MATH  Google Scholar 

  24. Yang Y, Korb K, Ting KM, Webb GI (2005) Ensemble selection for superparent-one-dependence estimators. In: AI 2005: advances in artificial intelligence, pp 102–112. Springer

  25. Yang Y, Webb GI, Cerquides J, Korb KB, Boughton J, Ting KM (2007) To select or to weigh: a comparative study of linear combination schemes for superparent-one-dependence estimators. IEEE Trans Knowl Data Eng 19(12):1652–1665

    Article  Google Scholar 

  26. Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. Proc Twent Int Conf Mach Learn 3:856–863

    Google Scholar 

  27. Zaidi NA, Webb GI (2013) Fast and effective single pass Bayesian learning. In: Pei J, Tseng VS, Cao L, Motoda H, Xu G (eds) Proceedings of the 17th Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 149–160

  28. Zheng F, Webb GI (2007) Finding the right family: parent and child selection for averaged one-dependence estimators. In: Machine learning: ECML 2007, pp 490–501. Springer

  29. Zheng F, Webb GI, Suraweera P, Zhu L (2012) Subsumption resolution: an efficient and effective technique for semi-naive Bayesian learning. Mach Learn 87(1):93–125

    Article  MathSciNet  MATH  Google Scholar 

  30. Zheng Z, Webb GI (2000) Lazy learning of Bayesian rules. Mach Learn 41(1):53–84

    Article  Google Scholar 

Download references

Acknowledgments

This research has been supported by the Australian Research Council under Grant DP140100087, Asian Office of Aerospace Research and Development, Air Force Office of Scientific Research under contract FA23861214030, National Natural Science Foundation of China under Grant 61202135, 61272209, Natural Science Foundation of Jiangsu, China under Grant BK20130735, Natural Science Foundation of Jiangsu Higher Education Institutions of China under Grant 14KJB520019, 13KJB520011, 13KJB520013, the open project program of Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Priority Academic Program Development of Jiangsu Higher Education Institutions. This research has also been supported in part by the Monash e-Research Center and eSolutions-Research Support Services through the use of the Monash Campus HPC Cluster and the LIEF Grant. This research was also undertaken on the NCI National Facility in Canberra, Australia, which is supported by the Australian Commonwealth Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shenglei Chen.

Appendices

Appendix 1: Table of RMSE

See Table 8.

Table 8 RMSE of involved algorithms on 15 large data sets

Appendix 2: Table of zero-one loss

See Table 9.

Table 9 Zero-one loss of involved algorithms on 15 large data sets

Appendix 3: Table of bias and variance

See Table 10.

Table 10 Bias and Variance decomposition of involved algorithms on 15 large data sets

Appendix 4: Table of computing time

See Table 11.

Table 11 Computing time of involved algorithms on 15 large data sets

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, S., Martínez, A.M., Webb, G.I. et al. Selective AnDE for large data learning: a low-bias memory constrained approach. Knowl Inf Syst 50, 475–503 (2017). https://doi.org/10.1007/s10115-016-0937-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-016-0937-9

Keywords

Navigation