Abstract
The classification of new cases using a predictive model incurs two types of costs—testing costs and misclassification costs. Recent research efforts have resulted in several novel algorithms that attempt to produce learners that simultaneously minimize both types. In many real life scenarios, however, we cannot afford to conduct all the tests required by the predictive model. For example, a medical center might have a fixed predetermined budget for diagnosing each patient. For cost bounded classification, decision trees are considered attractive as they measure only the tests along a single path. In this work we present an anytime framework for producing decision-tree based classifiers that can make accurate decisions within a strict bound on testing costs. These bounds can be known to the learner, known to the classifier but not to the learner, or not predetermined. Extensive experiments with a variety of datasets show that our proposed framework produces trees with lower misclassification costs along a wide range of testing cost bounds.
Article PDF
Similar content being viewed by others
References
Abe, N., Zadrozny, B., & Langford, J. (2004). An iterative method for multi-class cost-sensitive learning. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2004), Seattle, WA, USA (pp. 3–11).
Asuncion, A., & Newman, D. (2007). UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://www.ics.uci.edu/~mlearn/MLRepository.html.
Baram, Y., El-Yaniv, R., & Luz, K. (2003). Online choice of active learning algorithms. In Proceedings of the 20th international conference on machine learning (ICML-2003), Washington, DC, USA (pp. 19–26).
Bayer-Zubek, V., & Dietterich, T.G. (2005). Integrating learning from examples into the search for diagnostic policies. Artificial Intelligence, 24(1), 263–303.
Bilgic, M., & Getoor, L. (2007). Voila: Efficient feature-value acquisition for classification. In Proceedings of the 22nd national conference on artificial intelligence (AAAI-2007), Vancouver, British Columbia, Canada (pp. 1225–1230).
Boddy, M., & Dean, T. L. (1994). Deliberation scheduling for problem solving in time-constrained environments. Artificial Intelligence, 67(2), 245–285.
Bouckaert, R. R. (2003). Choosing between two learning algorithms based on calibrated tests. In Proceedings of the 20th international conference on machine learning (ICML-2003), Washington, DC, USA (pp. 51–58).
Bourke, C., Deng, K., Scott, S. D., Schapire, R. E., & Vinodchandran, N. V. (2008). On reoptimizing multi-class classifiers. Machine Learning, 71(2–3), 219–242.
Bradford, J., Kunz, C., Kohavi, R., Brunk, C., & Brodley, C. (1998). Pruning decision trees with misclassification costs. In Proceedings of the 9th European conference on machine learning (ECML-1998), Chemnitz, Germany (pp. 131–136).
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984) Classification and regression trees. Wadsworth and Brooks, Monterey.
Craven, M. W. (1996). Extracting comprehensible models from trained neural networks. Ph.D. thesis, Department of Computer Sciences, University of Wisconsin, Madison.
Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7(Jan), 1–30.
Domingos, P. (1999). Metacost: a general method for making classifiers cost-sensitive. In Proceedings of the 5th international conference on knowledge discovery and data mining (KDD’1999), San Diego, CA, USA (pp. 155–164).
Dredze, M., Gevaryahu, R., & Elias-Bachrach, A. (2007). Learning fast classifiers for image spam. In Proceedings of the 4th conference on email and anti-spam (CEAS-2007), Mountain View, CA, USA
Drummond, C., & Holte, R. C. (2000). Exploiting the cost (in)sensitivity of decision tree splitting criteria. In Proceedings of the 17th international conference on machine learning (ICML-2000), San Francisco, CA, USA (pp. 239–246). San Mateo: Morgan Kaufmann.
Elkan, C. (2001). The foundations of cost-sensitive learning. In Proceedings of the 17th international joint conference on artificial intelligence (IJCAI-2001), Seattle, Washington, USA (pp. 973–978).
Esmeir, S., & Markovitch, S. (2007a). Anytime induction of cost-sensitive trees. In J. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Proceedings of the 21st annual conference on neural information processing systems (NIPS-2007), Vancouver, B.C., Canada (pp. 425–432). Cambridge: MIT Press.
Esmeir, S., & Markovitch, S. (2007b). Anytime learning of decision trees. Journal of Machine Learning Research, 8(May), 891–933.
Fan, W., Lee, W., Stolfo, S. J., & Miller, M. (2000). A multiple model cost-sensitive approach for intrusion detection. In Proceedings of the 11th European conference on machine learning (ECML-2000), Barcelona, Catalonia, Spain (pp. 142–153).
Farhangfar, A., Greiner, R., & Zinkevich, M. (2008). A fast way to produce near-optimal fixed-depth decision trees. In Proceedings of the 10th international symposium on artificial intelligence and mathematics (ISAIM-2008), Fort Lauderdale, Florida, USA.
Freitas, A., Pereira, A., & Brazdil, P. (2007). Cost-sensitive decision trees applied to medical data. In I. Song, J. Eder, & T. Nguyen (Eds.), Lecture notes in computer science: Vol. 4654. Proceedings of the 9th international conference on data warehousing and knowledge discovery (DaWak-2007), Regensburg, Germany (pp. 303–312). Berlin: Springer.
Greiner, R., Grove, A. J., & Roth, D. (2002). Learning cost-sensitive active classifiers. Artificial Intelligence, 139(2), 137–174.
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: data mining, inference, and prediction. New York: Springer.
Kaplan, H., Kushilevitz, E., & Mansour, Y. (2005). Learning with attribute costs. In Proceedings of the 37th annual ACM symposium on theory of computing (STOC-2005) (pp. 356–365).
Kapoor, A., & Greiner, R. (2005). Learning and classifying under hard budgets. In Proceedings of the 9th European conference on machine learning (ECML-2005), Porto, Portugal (pp. 170–181).
Kim, H., Kim, J., Bahk S., & Kang, I. (2004). Fast classification, calibration, and visualization of network attacks on backbone links. In H.-K. Kahng (Ed.), Lecture notes in computer science: Vol. 3090. Proceedings of the 18th international conference on information networking (ICOIN-2004), Busan, Korea (pp. 837–846). Berlin: Springer.
Kononenko, I. (2001). Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine, 23(1), 89–109.
Kun, D., Bourke, C., Scott, S., Sunderman, J., & Zheng, Y. (2007). Bandit-based algorithms for budgeted learning. In Proceedings of IEEE international conference on data mining (ICDM-2007), Omaha, NE, USA (pp. 463–468).
Lachiche, N., & Flach, P. (2003). Improving accuracy and cost of two-class and multi-class probabilistic classifiers using ROC curves. In Proceedings of the 20th international conference on machine learning (ICML-2003), Washington, DC, USA.
Lindenbaum, M., Markovitch, S., & Rusakov, D. (2004). Selective sampling for nearest neighbor classifiers. Machine Learning, 54(2), 125–152.
Ling, C. X., Yang, Q., Wang, J., & Zhang, S. (2004). Decision trees with minimal costs. In Proceedings of the 21st international conference on machine learning (ICML-2004), Banff, Alberta, Canada (pp. 69–77).
Lizotte, D. J., Madani, O., & Greiner, R. (2003). Budgeted learning of naive Bayes classifiers. In Proceedings of the 19th conference on uncertainty in artificial intelligence (UAI-2003), Acapulco, Mexico (pp. 378–385).
Luss, R., & d’Aspremont, A. (2009). Predicting abnormal returns from news using text classification. In Proceedings of the 1st international workshop on advances in machine learning for computational finance, London, UK.
Margineantu, D. (2005). Active cost-sensitive learning. In Proceedings of the 19th international joint conference on artificial intelligence (IJCAI-2005), Edinburgh, Scotland (pp. 1622–1623).
Melville, P., Saar-Tsechansky, M., Provost, F., & Mooney, R. J. (2004). Active feature acquisition for classifier induction. In Proceedings of the 4th IEEE international conference on data mining (ICDM-2004), Brighton, UK (pp. 483–486).
Nijssen, S., & Fromont, E. (2007). Mining optimal decision trees from itemset lattices. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2007), San Jose, CA, USA (pp. 530–539).
Norton, S. W. (1989). Generating better decision trees. In N. S. Sridharan (Ed.), Proceedings of the 11th international joint conference on artificial intelligence, Detroit, Michigan, USA (pp. 800–805).
Nunez, M. (1991). The use of background knowledge in decision tree induction. Machine Learning, 6(3), 231–250.
O’Brien, D., Gupta, M., & Gray, R. (2008). Cost-sensitive multi-class classification from probability estimates. In A. McCallum & S. Roweis (Eds.), Proceedings of the 25th international conference on machine learning (ICML-2008), Helsinki, Finland (pp. 712–719).
Page, D., & Ray, S. (2003). Skewing: an efficient alternative to lookahead for decision tree induction. In Proceedings of the 18th international joint conference on artificial intelligence (IJCAI-2003), Acapulco, Mexico (pp. 601–607).
Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., & Brunk, C. (1994). Reducing misclassification costs: knowledge intensive approaches to learning from noisy data. In Proceedings of the 11th international conference on machine learning (ICML-1994), New Brunswick, NJ, USA (pp. 217–225).
Provost, F., & Buchanan, B. (1995). Inductive policy: The pragmatics of bias selection. Machine Learning, 20(1–2), 35–61.
Provost, F., Melville, P., & Saar-Tsechansky, M. (2007). Data acquisition and cost-effective predictive modeling: targeting offers for electronic commerce. In Proceedings of the 9th international conference on electronic commerce (ICEC-2007) (pp. 389–398).
Quinlan, J. R. (1993). C4.5: programs for machine learning. San Mateo: Morgan Kaufmann.
Russell, S. J., & Zilberstein, S. (1996). Optimal composition of real-time systems. Artificial Intelligence, 82(1–2), 181–213.
Sheng, V. S., & Ling, C. X. (2007a). Partial example acquisition in cost-sensitive learning. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2007), San Jose, CA, USA (pp. 638–646).
Sheng, V. S., & Ling, C. X. (2007b). Roulette sampling for cost-sensitive learning. In Proceedings of the 18th European conference on machine learning (ECML-2007), Warsaw, Poland (pp. 724–731).
Sheng, S., Ling, C. X., Ni, A., & Zhang, S. (2006). Cost-sensitive test strategies. In Proceedings of the 21st national conference on artificial intelligence (AAAI-2006), Boston, MA, USA (pp. 482–487).
Sheng, S., Ling, C. X., & Yang, Q. (2005). Simple test strategies for cost-sensitive decision trees. In Proceedings of the 9th European conference on machine learning (ECML-2005), Porto, Portugal (pp. 365–376).
Tan, M., & Schlimmer, J. C. (1989). Cost-sensitive concept learning of sensor use in approach and recognition. In Proceedings of the 6th international workshop on machine learning, Ithaca, NY, USA (pp. 392–395).
Turney, P. D. (1995). Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. Journal of Artificial Intelligence Research, 2, 369–409.
Turney, P. (2000). Types of cost in inductive concept learning. In Proceedings of the workshop on cost-sensitive learning held with the 17th international conference on machine learning (ICML-2000), Stanford, CA, USA (pp. 5–21).
Ueno, K., Xi, X., Keogh, E., & Lee, D. (2006). Anytime classification using the nearest neighbor algorithm with applications to stream mining. In Proceedings of the 6th IEEE international conference on data mining (ICDM-2006), Washington, DC, USA (pp. 623–632).
Vadera, S. (2005). Inducing cost-sensitive non-linear decision trees (Technical Report 03-05-2005). School of Computing, Science and Engineering, University of Salford.
Wang, S. (2010). Machine learning based volume diagnosis of semiconductor chips. Patent application, United States, number 20100005041.
Wang, Y., & Yu, S.-Z. (2009). Supervised learning real-time traffic classifiers. Journal of Networks, 4(7), 622–629.
Webb, G. (1996). Cost-sensitive specialization. In Proceedings of the 4th pacific rim international conference on artificial intelligence (PRICAI-1996), London, UK (pp. 23–34). Berlin: Springer.
Yang, Y., Webb, G., Korb, K., & Ting, K. (2007). Classifying under computational resource constraints: anytime classification using probabilistic estimators. Machine Learning, 69(1), 35–53.
Zadrozny, B., Langford, J., & Abe, N. (2003). Cost-sensitive learning by cost-proportionate example weighting. In Proceedings of the 3rd IEEE international conference on data mining (ICDM-2003), Melbourne, FL, USA (pp. 435–442). Berlin: Springer.
Zhu, X., Wu, X., Khoshgoftaar, T., & Yong, S. (2007). An empirical study of the noise impact on cost-sensitive learning. In Proceedings of the 20th international joint conference on artificial intelligence (IJCAI-2007), Hyderabad, India (pp. 1168–1173).
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Johannes Fürnkranz.
Rights and permissions
About this article
Cite this article
Esmeir, S., Markovitch, S. Anytime learning of anycost classifiers. Mach Learn 82, 445–473 (2011). https://doi.org/10.1007/s10994-010-5228-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-010-5228-1