Abstract
Many algorithms in decision tree learning are not designed to handle numeric valued attributes very well. Therefore, discretization of the continuous feature space has to be carried out. In this article we introduce the concept of cost sensitive discretization as a preprocessing step to induction of a classifier and as an elaboration of the error-based discretization method to obtain an optimal multi-interval splitting for each numeric attribute. A transparant description of the method and steps involved in cost sensitive discretization is given. We also evaluate its performance against two other well known methods, i.e. entropy-based discretization and pure error-based discretization on a real life financial dataset. From the algoritmic point of view, we show that an important deficiency from error-based discretization methods can be solved by introducing costs. From the application point of view, we discovered that using a discretization method is recommended. To conclude, we use ROC-curves to illustrate that under particular conditions cost-based discretization may be optimal.
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Barber C., Dobkin D., and Huhdanpaa H. (1993). The quickhull algorithm for convex hull. Technical Report GCG53, University of Minesota.
Catlett J. (1991). On changing continuous attributes into ordered discrete attributes. In Proceedings of the Fifth European Working Session on Learning, 164–178. Berlin: Springer-Verlag.
Dougherty J., Kohavi R., and Sahami M. (1995). Supervised and unsupervised discretization of continous features. In Machine Learning: Proceedings of the Twelfth Int. Conference, 194–202. Morgan Kaufmann.
Elomaa T., and Rousu J. (1996). Finding Optimal Multi-Splits for Numerical Attributes in Decision Tree Learning. Technical Report NC-TR-96-041, University of Helsinki.
Fayyad U., and Irani K. (1992). On the handling of continuous-valued attributes in decision tree generation. In Machine Learning 8. 87–102.
Fayyad U., and Irani K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the Thirteenth Int. Joint Conference on Artificial Intelligence, 1022–1027. Morgan Kaufmann.
Fulton T., Kasif S., and Salzberg S. (1995). Efficient algorithms for finding multi-way splits for decision trees. In Proceedings of the Twelfth Int. Conference on Machine Learning, 244–251. Morgan Kaufmann.
Holte R. (1993). Very simple classification rules perform well on most commonly used datasets. In Machine Learning 11, 63–90.
Kerber R. (1992). Chimerge: Discretization of numeric attributes. In Proceedings of the Tenth Nat. Conference on Artificial Intelligence, 123–128. MIT Press.
Kohavi R., and Sahami M. (1996). Error-based and Entropy-Based Discretization of Continuous Features. In Proceedings of the Second Int. Conference on Knowledge & Data Mining, 114–119. AAAI Press.
Maas W. (1994). Efficient agnostic PAC-learning with simple hypotheses. In Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory, 67–75. ACM Press.
Provost F., and Fawcett T. (1997). Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions. In Proceedings of the Third Int. Conference on Knowledge Discovery and Data Mining, 43–48, AAAI Press.
Van de Merckt T. (1993). Decision Trees in Numerical Attributes Spaces. In Proceedings of the Thirteenth Int. Joint Conference on Artificial Intelligence, 1016–1021, Morgan Kaufmann.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brijs, T., Vanhoof, K. (1998). Cost sensitive discretization of numeric attributes. In: Żytkow, J.M., Quafafou, M. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1998. Lecture Notes in Computer Science, vol 1510. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0094810
Download citation
DOI: https://doi.org/10.1007/BFb0094810
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65068-3
Online ISBN: 978-3-540-49687-8
eBook Packages: Springer Book Archive