Anytime learning of anycost classifiers

Esmeir, Saher; Markovitch, Shaul

doi:10.1007/s10994-010-5228-1

Anytime learning of anycost classifiers

Published: 25 November 2010

Volume 82, pages 445–473, (2011)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Anytime learning of anycost classifiers

Download PDF

Saher Esmeir¹ &
Shaul Markovitch¹

670 Accesses
13 Citations
Explore all metrics

Abstract

The classification of new cases using a predictive model incurs two types of costs—testing costs and misclassification costs. Recent research efforts have resulted in several novel algorithms that attempt to produce learners that simultaneously minimize both types. In many real life scenarios, however, we cannot afford to conduct all the tests required by the predictive model. For example, a medical center might have a fixed predetermined budget for diagnosing each patient. For cost bounded classification, decision trees are considered attractive as they measure only the tests along a single path. In this work we present an anytime framework for producing decision-tree based classifiers that can make accurate decisions within a strict bound on testing costs. These bounds can be known to the learner, known to the classifier but not to the learner, or not predetermined. Extensive experiments with a variety of datasets show that our proposed framework produces trees with lower misclassification costs along a wide range of testing cost bounds.

References

Abe, N., Zadrozny, B., & Langford, J. (2004). An iterative method for multi-class cost-sensitive learning. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2004), Seattle, WA, USA (pp. 3–11).
Chapter Google Scholar
Asuncion, A., & Newman, D. (2007). UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://www.ics.uci.edu/~mlearn/MLRepository.html.
Baram, Y., El-Yaniv, R., & Luz, K. (2003). Online choice of active learning algorithms. In Proceedings of the 20th international conference on machine learning (ICML-2003), Washington, DC, USA (pp. 19–26).
Google Scholar
Bayer-Zubek, V., & Dietterich, T.G. (2005). Integrating learning from examples into the search for diagnostic policies. Artificial Intelligence, 24(1), 263–303.
MATH Google Scholar
Bilgic, M., & Getoor, L. (2007). Voila: Efficient feature-value acquisition for classification. In Proceedings of the 22nd national conference on artificial intelligence (AAAI-2007), Vancouver, British Columbia, Canada (pp. 1225–1230).
Google Scholar
Boddy, M., & Dean, T. L. (1994). Deliberation scheduling for problem solving in time-constrained environments. Artificial Intelligence, 67(2), 245–285.
Article MATH Google Scholar
Bouckaert, R. R. (2003). Choosing between two learning algorithms based on calibrated tests. In Proceedings of the 20th international conference on machine learning (ICML-2003), Washington, DC, USA (pp. 51–58).
Google Scholar
Bourke, C., Deng, K., Scott, S. D., Schapire, R. E., & Vinodchandran, N. V. (2008). On reoptimizing multi-class classifiers. Machine Learning, 71(2–3), 219–242.
Article Google Scholar
Bradford, J., Kunz, C., Kohavi, R., Brunk, C., & Brodley, C. (1998). Pruning decision trees with misclassification costs. In Proceedings of the 9th European conference on machine learning (ECML-1998), Chemnitz, Germany (pp. 131–136).
Google Scholar
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984) Classification and regression trees. Wadsworth and Brooks, Monterey.
MATH Google Scholar
Craven, M. W. (1996). Extracting comprehensible models from trained neural networks. Ph.D. thesis, Department of Computer Sciences, University of Wisconsin, Madison.
Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7(Jan), 1–30.
MathSciNet Google Scholar
Domingos, P. (1999). Metacost: a general method for making classifiers cost-sensitive. In Proceedings of the 5th international conference on knowledge discovery and data mining (KDD’1999), San Diego, CA, USA (pp. 155–164).
Chapter Google Scholar
Dredze, M., Gevaryahu, R., & Elias-Bachrach, A. (2007). Learning fast classifiers for image spam. In Proceedings of the 4th conference on email and anti-spam (CEAS-2007), Mountain View, CA, USA
Google Scholar
Drummond, C., & Holte, R. C. (2000). Exploiting the cost (in)sensitivity of decision tree splitting criteria. In Proceedings of the 17th international conference on machine learning (ICML-2000), San Francisco, CA, USA (pp. 239–246). San Mateo: Morgan Kaufmann.
Google Scholar
Elkan, C. (2001). The foundations of cost-sensitive learning. In Proceedings of the 17th international joint conference on artificial intelligence (IJCAI-2001), Seattle, Washington, USA (pp. 973–978).
Google Scholar
Esmeir, S., & Markovitch, S. (2007a). Anytime induction of cost-sensitive trees. In J. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Proceedings of the 21st annual conference on neural information processing systems (NIPS-2007), Vancouver, B.C., Canada (pp. 425–432). Cambridge: MIT Press.
Google Scholar
Esmeir, S., & Markovitch, S. (2007b). Anytime learning of decision trees. Journal of Machine Learning Research, 8(May), 891–933.
Google Scholar
Fan, W., Lee, W., Stolfo, S. J., & Miller, M. (2000). A multiple model cost-sensitive approach for intrusion detection. In Proceedings of the 11th European conference on machine learning (ECML-2000), Barcelona, Catalonia, Spain (pp. 142–153).
Google Scholar
Farhangfar, A., Greiner, R., & Zinkevich, M. (2008). A fast way to produce near-optimal fixed-depth decision trees. In Proceedings of the 10th international symposium on artificial intelligence and mathematics (ISAIM-2008), Fort Lauderdale, Florida, USA.
Google Scholar
Freitas, A., Pereira, A., & Brazdil, P. (2007). Cost-sensitive decision trees applied to medical data. In I. Song, J. Eder, & T. Nguyen (Eds.), Lecture notes in computer science: Vol. 4654. Proceedings of the 9th international conference on data warehousing and knowledge discovery (DaWak-2007), Regensburg, Germany (pp. 303–312). Berlin: Springer.
Google Scholar
Greiner, R., Grove, A. J., & Roth, D. (2002). Learning cost-sensitive active classifiers. Artificial Intelligence, 139(2), 137–174.
Article MathSciNet Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: data mining, inference, and prediction. New York: Springer.
MATH Google Scholar
Kaplan, H., Kushilevitz, E., & Mansour, Y. (2005). Learning with attribute costs. In Proceedings of the 37th annual ACM symposium on theory of computing (STOC-2005) (pp. 356–365).
Chapter Google Scholar
Kapoor, A., & Greiner, R. (2005). Learning and classifying under hard budgets. In Proceedings of the 9th European conference on machine learning (ECML-2005), Porto, Portugal (pp. 170–181).
Google Scholar
Kim, H., Kim, J., Bahk S., & Kang, I. (2004). Fast classification, calibration, and visualization of network attacks on backbone links. In H.-K. Kahng (Ed.), Lecture notes in computer science: Vol. 3090. Proceedings of the 18th international conference on information networking (ICOIN-2004), Busan, Korea (pp. 837–846). Berlin: Springer.
Google Scholar
Kononenko, I. (2001). Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine, 23(1), 89–109.
Article MathSciNet Google Scholar
Kun, D., Bourke, C., Scott, S., Sunderman, J., & Zheng, Y. (2007). Bandit-based algorithms for budgeted learning. In Proceedings of IEEE international conference on data mining (ICDM-2007), Omaha, NE, USA (pp. 463–468).
Google Scholar
Lachiche, N., & Flach, P. (2003). Improving accuracy and cost of two-class and multi-class probabilistic classifiers using ROC curves. In Proceedings of the 20th international conference on machine learning (ICML-2003), Washington, DC, USA.
Google Scholar
Lindenbaum, M., Markovitch, S., & Rusakov, D. (2004). Selective sampling for nearest neighbor classifiers. Machine Learning, 54(2), 125–152.
Article MATH Google Scholar
Ling, C. X., Yang, Q., Wang, J., & Zhang, S. (2004). Decision trees with minimal costs. In Proceedings of the 21st international conference on machine learning (ICML-2004), Banff, Alberta, Canada (pp. 69–77).
Google Scholar
Lizotte, D. J., Madani, O., & Greiner, R. (2003). Budgeted learning of naive Bayes classifiers. In Proceedings of the 19th conference on uncertainty in artificial intelligence (UAI-2003), Acapulco, Mexico (pp. 378–385).
Google Scholar
Luss, R., & d’Aspremont, A. (2009). Predicting abnormal returns from news using text classification. In Proceedings of the 1st international workshop on advances in machine learning for computational finance, London, UK.
Google Scholar
Margineantu, D. (2005). Active cost-sensitive learning. In Proceedings of the 19th international joint conference on artificial intelligence (IJCAI-2005), Edinburgh, Scotland (pp. 1622–1623).
Google Scholar
Melville, P., Saar-Tsechansky, M., Provost, F., & Mooney, R. J. (2004). Active feature acquisition for classifier induction. In Proceedings of the 4th IEEE international conference on data mining (ICDM-2004), Brighton, UK (pp. 483–486).
Chapter Google Scholar
Nijssen, S., & Fromont, E. (2007). Mining optimal decision trees from itemset lattices. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2007), San Jose, CA, USA (pp. 530–539).
Chapter Google Scholar
Norton, S. W. (1989). Generating better decision trees. In N. S. Sridharan (Ed.), Proceedings of the 11th international joint conference on artificial intelligence, Detroit, Michigan, USA (pp. 800–805).
Google Scholar
Nunez, M. (1991). The use of background knowledge in decision tree induction. Machine Learning, 6(3), 231–250.
MathSciNet Google Scholar
O’Brien, D., Gupta, M., & Gray, R. (2008). Cost-sensitive multi-class classification from probability estimates. In A. McCallum & S. Roweis (Eds.), Proceedings of the 25th international conference on machine learning (ICML-2008), Helsinki, Finland (pp. 712–719).
Chapter Google Scholar
Page, D., & Ray, S. (2003). Skewing: an efficient alternative to lookahead for decision tree induction. In Proceedings of the 18th international joint conference on artificial intelligence (IJCAI-2003), Acapulco, Mexico (pp. 601–607).
Google Scholar
Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., & Brunk, C. (1994). Reducing misclassification costs: knowledge intensive approaches to learning from noisy data. In Proceedings of the 11th international conference on machine learning (ICML-1994), New Brunswick, NJ, USA (pp. 217–225).
Google Scholar
Provost, F., & Buchanan, B. (1995). Inductive policy: The pragmatics of bias selection. Machine Learning, 20(1–2), 35–61.
Google Scholar
Provost, F., Melville, P., & Saar-Tsechansky, M. (2007). Data acquisition and cost-effective predictive modeling: targeting offers for electronic commerce. In Proceedings of the 9th international conference on electronic commerce (ICEC-2007) (pp. 389–398).
Google Scholar
Quinlan, J. R. (1993). C4.5: programs for machine learning. San Mateo: Morgan Kaufmann.
Google Scholar
Russell, S. J., & Zilberstein, S. (1996). Optimal composition of real-time systems. Artificial Intelligence, 82(1–2), 181–213.
MathSciNet Google Scholar
Sheng, V. S., & Ling, C. X. (2007a). Partial example acquisition in cost-sensitive learning. In Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-2007), San Jose, CA, USA (pp. 638–646).
Chapter Google Scholar
Sheng, V. S., & Ling, C. X. (2007b). Roulette sampling for cost-sensitive learning. In Proceedings of the 18th European conference on machine learning (ECML-2007), Warsaw, Poland (pp. 724–731).
Google Scholar
Sheng, S., Ling, C. X., Ni, A., & Zhang, S. (2006). Cost-sensitive test strategies. In Proceedings of the 21st national conference on artificial intelligence (AAAI-2006), Boston, MA, USA (pp. 482–487).
Google Scholar
Sheng, S., Ling, C. X., & Yang, Q. (2005). Simple test strategies for cost-sensitive decision trees. In Proceedings of the 9th European conference on machine learning (ECML-2005), Porto, Portugal (pp. 365–376).
Google Scholar
Tan, M., & Schlimmer, J. C. (1989). Cost-sensitive concept learning of sensor use in approach and recognition. In Proceedings of the 6th international workshop on machine learning, Ithaca, NY, USA (pp. 392–395).
Google Scholar
Turney, P. D. (1995). Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. Journal of Artificial Intelligence Research, 2, 369–409.
Google Scholar
Turney, P. (2000). Types of cost in inductive concept learning. In Proceedings of the workshop on cost-sensitive learning held with the 17th international conference on machine learning (ICML-2000), Stanford, CA, USA (pp. 5–21).
Google Scholar
Ueno, K., Xi, X., Keogh, E., & Lee, D. (2006). Anytime classification using the nearest neighbor algorithm with applications to stream mining. In Proceedings of the 6th IEEE international conference on data mining (ICDM-2006), Washington, DC, USA (pp. 623–632).
Chapter Google Scholar
Vadera, S. (2005). Inducing cost-sensitive non-linear decision trees (Technical Report 03-05-2005). School of Computing, Science and Engineering, University of Salford.
Wang, S. (2010). Machine learning based volume diagnosis of semiconductor chips. Patent application, United States, number 20100005041.
Wang, Y., & Yu, S.-Z. (2009). Supervised learning real-time traffic classifiers. Journal of Networks, 4(7), 622–629.
Google Scholar
Webb, G. (1996). Cost-sensitive specialization. In Proceedings of the 4th pacific rim international conference on artificial intelligence (PRICAI-1996), London, UK (pp. 23–34). Berlin: Springer.
Google Scholar
Yang, Y., Webb, G., Korb, K., & Ting, K. (2007). Classifying under computational resource constraints: anytime classification using probabilistic estimators. Machine Learning, 69(1), 35–53.
Article Google Scholar
Zadrozny, B., Langford, J., & Abe, N. (2003). Cost-sensitive learning by cost-proportionate example weighting. In Proceedings of the 3rd IEEE international conference on data mining (ICDM-2003), Melbourne, FL, USA (pp. 435–442). Berlin: Springer.
Chapter Google Scholar
Zhu, X., Wu, X., Khoshgoftaar, T., & Yong, S. (2007). An empirical study of the noise impact on cost-sensitive learning. In Proceedings of the 20th international joint conference on artificial intelligence (IJCAI-2007), Hyderabad, India (pp. 1168–1173).
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Technion—Israel Institute of Technology, Haifa, Israel
Saher Esmeir & Shaul Markovitch

Authors

Saher Esmeir
View author publications
You can also search for this author in PubMed Google Scholar
Shaul Markovitch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saher Esmeir.

Additional information

Editor: Johannes Fürnkranz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Esmeir, S., Markovitch, S. Anytime learning of anycost classifiers. Mach Learn 82, 445–473 (2011). https://doi.org/10.1007/s10994-010-5228-1

Download citation

Received: 04 July 2009
Revised: 24 October 2010
Accepted: 30 October 2010
Published: 25 November 2010
Issue Date: March 2011
DOI: https://doi.org/10.1007/s10994-010-5228-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Anytime learning of anycost classifiers

Abstract

Article PDF

Similar content being viewed by others

Learning optimal decision trees using constraint programming

Decision Trees for Function Evaluation: Simultaneous Optimization of Worst and Expected Cost

Learning Decision Trees with Flexible Constraints and Objectives Using Integer Optimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Anytime learning of anycost classifiers

Abstract

Article PDF

Similar content being viewed by others

Learning optimal decision trees using constraint programming

Decision Trees for Function Evaluation: Simultaneous Optimization of Worst and Expected Cost

Learning Decision Trees with Flexible Constraints and Objectives Using Integer Optimization

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation