Machine Learning

, Volume 69, Issue 1, pp 35–53 | Cite as

Classifying under computational resource constraints: anytime classification using probabilistic estimators

Article

Abstract

In many online applications of machine learning, the computational resources available for classification will vary from time to time. Most techniques are designed to operate within the constraints of the minimum expected resources and fail to utilize further resources when they are available. We propose a novel anytime classification algorithm, anytime averaged probabilistic estimators (AAPE), which is capable of delivering strong prediction accuracy with little CPU time and utilizing additional CPU time to increase classification accuracy. The idea is to run an ordered sequence of very efficient Bayesian probabilistic estimators (single improvement steps) until classification time runs out. Theoretical studies and empirical validations reveal that by properly identifying, ordering, invoking and ensembling single improvement steps, AAPE is able to accomplish accurate classification whenever it is interrupted. It is also able to output class probability estimates beyond simple 0/1-loss classifications, as well as adeptly handle incremental learning.

Keywords

Anytime learning Anytime classification Probabilistic prediction Bayesian classifiers Ensemble methods 

References

  1. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723. MATHCrossRefMathSciNetGoogle Scholar
  2. Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Reading: Addison–Wesley. Google Scholar
  3. Bernstein, D. S., Perkins, T. J., Zilberstein, S., & Finkelstein, L. (2002). Scheduling contract algorithms on multiple processors. In Proceedings of the 18th national conference on artificial intelligence and the 14th conference on innovative applications of artificial intelligence (pp. 702–706). Google Scholar
  4. Blake, C., & Merz, C. J. (2004). UCI repository of machine learning databases. [Machine-readable data repository]. Department of Information and Computer Science, University of California, Irvine, CA, USA. Google Scholar
  5. Breiman, L. (1996). Bias, variance and arcing classifiers (Technical report 460). Berkeley: Statistics Department, University of California. Google Scholar
  6. Chan, P., Fan, W., Prodromidis, A., & Stolfo, S. (1999). Distributed data mining in credit card fraud detection. IEEE Intelligent Systems, 14(6), 67–74. CrossRefGoogle Scholar
  7. DeCoste, D. (2002). Anytime interval-valued outputs for kernel machines: fast support vector machine classification via distance geometry. In Proceedings of the 19th international conference on machine learning (pp. 99–106). Google Scholar
  8. Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30. MathSciNetGoogle Scholar
  9. Friedman, J. H. (1997). On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1(1), 55–77. CrossRefGoogle Scholar
  10. Grass, J., & Zilberstein, S. (1996). Anytime algorithm development tools. In M. Pittarelli (Ed.), SIGART Bulletin Special Issue on Anytime Algorithms and Deliberation Scheduling, 7(2), 20–27. Google Scholar
  11. Grefenstette, J., & Ramsey, C. (1992). An approach to anytime learning. In Proceedings of the 9th international machine learning workshop. Google Scholar
  12. Keogh, E. J., & Pazzani, M. J. (2002). Learning the structure of augmented Bayesian classifiers. International Journal on Artificial Intelligence Tools, 11(40), 587–601. CrossRefGoogle Scholar
  13. Kohavi, R., & John, G. H. (1996). Wrappers for feature subset selection. Artificial Intelligence, Special Issue on Relevance, 97(1–2), 273–324. Google Scholar
  14. Kohavi, R., & Wolpert, D. (1996). Bias plus variance decomposition for zero-one loss functions. In Proceedings of the 13th international conference on machine learning (pp. 275–283). Google Scholar
  15. Koller, D., & Sahami, M. (1996). Toward optimal feature selection. In Proceedings of the 13th international conference on machine learning (pp. 284–292). Google Scholar
  16. Kong, E. B., & Dietterich, T. G. (1995). Error-correcting output coding corrects bias and variance. In Proceedings of the 12th international conference on machine learning (pp. 313–321). Google Scholar
  17. Korb, K., & Nicholson, A. (2004). Bayesian artificial intelligence. London: Chapman & Hall/CRC. MATHGoogle Scholar
  18. Langley, P., & Sage, S. (1994). Induction of selective Bayesian classifiers. In Proceedings of the 10th annual conference on uncertainty in artificial intelligence. Google Scholar
  19. Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifiers. In Proceedings of the 10th national conference on artificial intelligence (pp. 223–228). Google Scholar
  20. Lewis, D. D. (1998). Naive Bayes at forty: the independence assumption in information retrieval. In Proceedings of the 10th European conference on machine learning (pp. 4–15). Google Scholar
  21. Mitchell, T. M. (1997). Machine learning. New York: McGraw–Hill. MATHGoogle Scholar
  22. Opitz, D. (1995). An anytime approach to confectionist theory refinement: refining the topologies of knowledge-based neural networks. Unpublished doctoral dissertation, Department of Computer Sciences, University of Wisconsin-Madison, USA. Google Scholar
  23. Resnick, P., & Varian, H. R. (1997). Recommender systems. Communications of the ACM, 40(3), 56–58. CrossRefGoogle Scholar
  24. Sahami, M. (1996). Learning limited dependence Bayesian classifiers. In Proceedings of the 2nd international conference on knowledge discovery and data mining. Google Scholar
  25. Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–465. MATHMathSciNetGoogle Scholar
  26. Suzuki, J. (1996). Learning Bayesian belief networks based on the MDL principle: an efficient algorithm using the branch and bound technique. In Proceedings of the 13th international conference on machine learning (pp. 463–470). Google Scholar
  27. Turney, P. (2000). Types of cost in inductive concept learning. In Workshop on cost-sensitive learning at ICML 2000 (pp. 15–21). Google Scholar
  28. Webb, G. I. (2000). Multiboosting: a technique for combining boosting and wagging. Machine Learning, 40(2), 159–196. CrossRefGoogle Scholar
  29. Webb, G. I., Pazzani, M. J., & Billsus, D. (2001). Machine learning for user modeling. User Modeling and User-Adapted Interaction, 11(1–2), 19–29. MATHCrossRefGoogle Scholar
  30. Webb, G. I., Boughton, J., & Wang, Z. (2005). Not so naive Bayes: averaged one-dependence estimators. Machine Learning, 58(1), 5–24. MATHCrossRefGoogle Scholar
  31. Witten, I. H., & Frank, E. (2005). Data mining: practical machine learning tools and techniques with Java implementations (2nd ed.). Los Altos: Kaufmann. Google Scholar
  32. Wu, X., & Urpani, D. (1999). Induction by attribute elimination. IEEE Transactions on Knowledge and Data Engineering, 11(5), 805–812. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Ying Yang
    • 1
  • Geoff Webb
    • 1
  • Kevin Korb
    • 1
  • Kai Ming Ting
    • 1
  1. 1.Clayton School of Information TechnologyMonash UniversityClaytonAustralia

Personalised recommendations