Abstract
Learning from data with test cost and misclassification cost has been a hot topic in data mining. Many algorithms have been proposed to induce decision trees for this purpose. This paper studies a number of such algorithms and presents a competition strategy to obtain trees with lower cost. First, we generate a population of decision trees using λ-ID3 and EG2 algorithms through considering information gain and test cost. λ-ID3 is a generalization of three existing algorithms, namely ID3, IDX, and CS-ID3. EG2 is another parameterized algorithm, and its parameter range is extended in this work. Second, we post-prune these trees by considering the tradeoff between the test cost and the misclassification cost. Finally, we select the best decision tree for classification. Experimental results on the mushroom dataset with various cost settings indicate: 1) there does not exist an optimal parameter for λ-ID3 or EG2; 2) the competition strategy is effective in selecting an appropriate decision tree; and 3) post-pruning can help decreasing the average cost effectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Yang, Q., Wu, X.: 10 challenging problems in data mining research. International Journal of Information Technology and Decision Making 5(4), 597–604 (2006)
Turney, P.D.: Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. Journal of Artificial Intelligence Research 2, 369–409 (1995)
Fan, W., Stolfo, S., Zhang, J., Chan, P.: Adacost: Misclassification cost-sensitive boosting. In: Proceedings of the 16th International Conference on Machine Learning, pp. 97–105 (1999)
Zhou, Z., Liu, X.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering 18(1), 63–77 (2006)
Kukar, M., Kononenko, I.: Cost-sensitive learning with neural networks. In: Proceedings of the 13th European Conference on Artificial Intelligence, pp. 445–449 (1998)
Chai, X.Y., Deng, L., Yang, Q., Ling, C.X.: Test-cost sensitive Naïve Bayes classification. In: Proceedings of the 5th International Conference on Data Mining, pp. 51–58 (2004)
Min, F., Liu, Q.: A hierarchical model for test-cost-sensitive decision systems. Information Sciences 179, 2442–2452 (2009)
Min, F., He, H., Qian, Y., Zhu, W.: Test-cost-sensitive attribute reduction. Information Sciences 181, 4928–4942 (2011)
Zhu, W., Wang, F.: Reduction and axiomization of covering generalized rough sets. Information Sciences 152(1), 217–230 (2003)
Zhu, W.: Generalized rough sets based on relations. Information Sciences 177(22), 4997–5011 (2007)
Turney, P.D.: Types of cost in inductive concept learning. In: Proceedings of the Workshop on Cost-Sensitive Learning at the 17th ICML, pp. 1–7 (2000)
Hunt, E.B., Marin, J., Stone, P.J. (eds.): Experiments in induction. Academic Press, New York (1966)
Ling, C.X., Sheng, V.S., Yang, Q.: Test strategies for cost-sensitive decision trees. IEEE Transactions on Knowledge and Data Engineering 18(8), 1055–1067 (2006)
Grefenstette, J.J.: Optimization of control parameters for genetic algorithms. IEEE Transactions on Systems, Man, and Cybernetics, 122–128 (1986)
Núñez, M.: The use of background knowledge in decision tree induction. Machine Learning 6, 231–250 (1991)
Yao, Y.Y., Zhao, Y.: Attribute reduction in decision-theoretic rough set models. Information Sciences 178(17), 3356–3373 (2008)
Min, F., Zhu, W.: Minimal Cost Attribute Reduction through Backtracking. In: Kim, T.-H., Adeli, H., Cuzzocrea, A., Arslan, T., Zhang, Y., Ma, J., Chung, K.-I., Mariyam, S., Song, X. (eds.) DTA/BSBT 2011. CCIS, vol. 258, pp. 100–107. Springer, Heidelberg (2011)
Jia, X., Li, W., Shang, L., Chen, J.: An Optimization Viewpoint of Decision-Theoretic Rough Set Model. In: Yao, J., Ramanna, S., Wang, G., Suraj, Z. (eds.) RSKT 2011. LNCS, vol. 6954, pp. 457–465. Springer, Heidelberg (2011)
Liu, D., Yao, Y.Y., Li, T.R.: Three-way investment decisions with decision-theoretic Rough sets. International Journal of Computational Intelligence Systems 4, 66–74 (2011)
Li, H., Zhou, X.: Risk decision making based on decision-theoretic rough set: a three-way view decision model. International Journal of Computational Intelligence Systems 4(1), 1–11 (2011)
Norton, S.: Generating better decision trees. In: Proceedings of the 11th International Joint Conference on Artificial Intelligence, pp. 800–805 (1989)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Tan, M.: Cost-sensitive learning of classification knowledge and its applications in robotics. Machine Learning 13, 7–33 (1993)
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/mlrepository.html
Min, F., Zhu, W., Zhao, H., Pan, G.: Coser: Cost-senstive rough sets (2011), http://grc.fjzs.edu.cn/~fmin/coser/
Quinlan, J.R. (ed.): C4.5 Programs for Machine Learning. Morgan kaufmann Publisher, San Mateo (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Min, F., Zhu, W. (2012). A Competition Strategy to Cost-Sensitive Decision Trees. In: Li, T., et al. Rough Sets and Knowledge Technology. RSKT 2012. Lecture Notes in Computer Science(), vol 7414. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31900-6_45
Download citation
DOI: https://doi.org/10.1007/978-3-642-31900-6_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31899-3
Online ISBN: 978-3-642-31900-6
eBook Packages: Computer ScienceComputer Science (R0)