Learning and Classifying Under Hard Budgets

  • Aloak Kapoor
  • Russell Greiner
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3720)


Since resources for data acquisition are seldom infinite, both learners and classifiers must act intelligently under hard budgets. In this paper, we consider problems in which feature values are unknown to both the learner and classifier, but can be acquired at a cost. Our goal is a learner that spends its fixed learning budget b L acquiring training data, to produce the most accurate “active classifier” that spends at most b C per instance. To produce this fixed-budget classifier, the fixed-budget learner must sequentially decide which feature values to collect to learn the relevant information about the distribution. We explore several approaches the learner can take, including the standard “round robin” policy (purchasing every feature of every instance until the b L budget is exhausted). We demonstrate empirically that round robin is problematic (especially for small b L ), and provide alternate learning strategies that achieve superior performance on a variety of datasets.


Loss Function Class Label Markov Decision Process Round Robin Dirichlet Distribution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Greiner, R., Grove, A.J., Roth, D.: Learning cost sensitive active classifiers. Artificial Intelligence (2002)Google Scholar
  2. 2.
    Dobkin, D., Gunopoulos, D., Kasif, S.: Computing optimal shallow decision trees. In: International Workshop on Mathematics in Artificial Intelligence (1996)Google Scholar
  3. 3.
    Heckerman, D.: A tutorial on learning in bayesian networks. In: Learning in Graphical Models. The MIT Press, Cambridge (1999)Google Scholar
  4. 4.
    Madani, O., Lizotte, D.J., Greiner, R.: Active model selection. Technical report, University of Alberta (2004)Google Scholar
  5. 5.
  6. 6.
    Lizotte, D.J., Madani, O., Greiner, R.: Budgeted learning of naive-bayes classifiers. In: Proceedings of Uncertainty In Artificial Intelligence (2003)Google Scholar
  7. 7.
    Robbins, H.: Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society (1952)Google Scholar
  8. 8.
    Madani, O., Lizotte, D.J., Greiner, R.: Active model selection. In: Proceedings of Uncertainty in Artificial Intelligence (2004)Google Scholar
  9. 9.
    Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998)Google Scholar
  10. 10.
    Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning. In: IJCAI (1993)Google Scholar
  11. 11.
    Turney, P.: Types of cost in inductive concept learning. In: Workshop on cost sensitive learning, ICML (2000)Google Scholar
  12. 12.
    Cohn, D.A., Ghahramani, Z., Jordan, M.I.: Active learning with statistical models. In: Advances in Neural Information Processing Systems (1995)Google Scholar
  13. 13.
    Chaloner, K., Verdinelli, I.: Bayesian experimental design: A review. Statistical Science (1995)Google Scholar
  14. 14.
    Musick, R., Catlett, J., Russell, S.: Decision theoretic subsampling for induction on large databases. In: International Conference on Machine Learning (1993)Google Scholar
  15. 15.
    Schuurmans, D., Greiner, R.: Sequential pac learning. In: COLT (1995)Google Scholar
  16. 16.
    Provost, F., Jensen, D., Oates, T.: Efficient progressive sampling. In: International Knowledge Discovery and Data Mining Conference (1999)Google Scholar
  17. 17.
    Melville, P., Saar-Tsechansky, M., Provost, F., Mooney, R.: Active feature-value acquisition for classifier induction. In: Perner, P. (ed.) ICDM 2004. LNCS (LNAI), vol. 3275. Springer, Heidelberg (2004)Google Scholar
  18. 18.
    Weiss, G.M., Provost, F.: Learning when training data are costly: the effect of class distribution on tree induction. Journal of Artificial Intelligence Research (2003)Google Scholar
  19. 19.
    Turney, P.: Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm. Journal of Artificial Intelligence Research (1995)Google Scholar
  20. 20.
    Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliffs (2002)Google Scholar
  21. 21.
    Kapoor, A., Greiner, R.: Reinforcement learning for active model selection. In: Utility Based Data Mining Workshop, KDD (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Aloak Kapoor
    • 1
  • Russell Greiner
    • 1
  1. 1.Department of Computing ScienceUniversity of AlbertaEdmonton

Personalised recommendations