Abstract
Decision tree learning algorithms and their application represent one the major successes of AI. Early research on these algorithms aimed to produce classification trees that were accurate. More recently, there has been recognition that in many applications, aiming to maximize accuracy alone is not adequate since the cost of misclassification may not be symmetric and that obtaining the data for classification may have an associated cost. This has led to significant research on the development of cost-sensitive decision tree induction algorithms. One of the seminal studies in this field has been the use of genetic algorithms to develop an algorithm known as ICET. Empirical trials have shown that ICET produces some of the best results for cost-sensitive decision tree induction. A key feature of ICET is that it uses a pool that consists of genes that represent biases and parameters. These biases and parameters are then passed to a decision tree learner known as EG2 to generate the trees. That is, it does not use a direct encoding of trees. This paper develops a new algorithm called ECCO (Evolutionary Classifier with Cost Optimization) that is based on the hypothesis that a direct representation of trees in a genetic pool leads to improvements over ICET. The paper includes an empirical evaluation of this hypothesis on four data sets and the results show that, in general, ECCO is more cost-sensitive and effective than ICET when test costs and misclassifications costs are considered.
Chapter PDF
Similar content being viewed by others
References
Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1986)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufman, San Mateo (1993)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann (2005)
Breiman, L., Friedman, J.H., Olsen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman and Hall/CRC, London (1984)
Turney, P.D.: Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm. Journal of Artificial Intelligence Research 2, 369–409 (1995)
Vadera, S.: CSNL A cost-sensitive non-linear decision tree algorithm. ACM Transactions on Knowledge Discovery from Data 4(2), 1–25 (2010)
Turney, P.D.: Types of cost in inductive concept learning. In: Proc. of the Workshop on Cost-Sensitive Learning, 7th Int. Conf. on Machine Learning, pp. 15–21 (2000)
Tan, M., Schlimmer, J.: Cost-Sensitive Concept Learning of Sensor use in Approach and Recognition. In: Proceedings of the 6th International Workshop on Machine Learning. ML 1989, Ithaca, New York, pp. 392–395 (1989)
Norton, S.W.: Generating Better Decision Trees. In: Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, IJCAI 1989, Detroit, Michigan, USA, pp. 800–805 (August 1989)
Núnez, M.: The Use of Background Knowledge in Decision Tree Induction. In: Machine Learning, vol. 6, pp. 231–250. Kluwer Academic Publishers, Boston (1991)
Davis, J.V., Ha, J., Rossbach, C.J., Ramadan, H.E., Witchel, E.: Cost-Sensitive Decision Tree Learning for Forensic Classification. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 622–629. Springer, Heidelberg (2006)
Liu, X.: A New Cost-Sensitive Decision Tree with Missing Values. Asian Journal of Information Technology 6(11), 1083–1090 (2007)
Freitas, A., Costa-Pereira, A., Brazdil, P.: Cost-Sensitive Decision Trees Applied to Medical Data. In: Song, I. Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2007. LNCS, vol. 4654, pp. 303–312. Springer, Heidelberg (2007)
Domingos, P.: MetaCost: A general method for making classifiers cost-sensitive. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM, New York (1999)
Moret, S., Langford, W., Margineantu, D.: Learning to predict channel stability using biogeomorphic features. Ecological Modelling 191(1), 47–57 (2006)
Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: AdaCost: misclassification cost-sensitive boosting. In: 16th International Conference on Machine Learning, Bled, Slovenia, June 27-30, pp. 97–105 (1999)
Lozano, A.C., Abe, N.: Multi-class cost-sensitive boosting with p-norm loss functions. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, Las Vegas, USA, August 24-24 (2008)
Lomax, S., Vadera, S.: A Survey of Cost-Sensitive Decision Tree Induction Algorithms. To Appear in ACM Computing Surveys 45(2) (2013)
Lomax, S., Vadera, S.: An Empirical Comparison of Cost-Sensitive Decision Tree Induction Algorithms. Expert Systems The Journal of Knowledge Engineering 28(3), 227–268 (2011)
Grefenstette, J.: Optimization of control parameters for genetic algorithms. IEEE Transactions on Systems, Man, and Cybernetics 16, 122–128 (1986)
Blake, C., Merz, C.: UCI Repository of Machine Learning Databases. University of California, Department of Information and Computer Science, Irvine, CA (1998), http://www.ics.uci.edu/mlearn/MLRepository.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 IFIP International Federation for Information Processing
About this paper
Cite this paper
Omielan, A., Vadera, S. (2012). ECCO: A New Evolutionary Classifier with Cost Optimisation. In: Shi, Z., Leake, D., Vadera, S. (eds) Intelligent Information Processing VI. IIP 2012. IFIP Advances in Information and Communication Technology, vol 385. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32891-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-32891-6_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32890-9
Online ISBN: 978-3-642-32891-6
eBook Packages: Computer ScienceComputer Science (R0)