Abstract
The discretization of continuous attributes in a given training set is an important issue, which significantly affects the performance of decision trees. This paper proposes a method to discretize the continuous attributes based on a random walk modeled statistical test. In this method, the algorithm tries to find the point which divides the training set T into two groups T 1 and T 2 such that T = T 1 ∪ T 2 with possibly many instances from a majority class included in T 1. In other words, the algorithm detects the splitting point, which gives the maximum discrepancy between the two empirical distributions, the majority class and the rest. The algorithm recursively executes this procedure until some statistical criterion is satisfied. Further, we report the effectiveness of the algorithm over ChiMerge and MDLPC based on an experiment with UCI repository.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blake, C., Keogh, E., and Merz, C.J.: UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/~mlearn/MLRepository.html, Irvine, CA: University of California, Department of Information and Computer Science (1998)
Catlett, J.: On Changing Continuous Attributes into Ordered Discrete Attributes, Proceedings of the European Working Session on Learning (1991) 164–178
Dougherty, J., Kohavi, R., and Sahami, M.: Supervised and Unsupervised Discretization of Continuous Features, Proceedings of the 12th International Conference on Machine Learning (1995) 194–202
Fayyad, U.M. and Irani, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning, Proceedings of the 13th International Joint Conference on Artificial Intelligence (1993) 1022–1027
Feller, W.: An Introduction to Probability Theory and Its Applications Vol.2, First Edition, John Wiley & Sons, New York (1966)
Kerber, R.: ChiMerge: Discretization of Numeric Attributes, Proceedings of the 10th National Conference on Artificial Intelligence (1992) 123–128
Quinlan, J.R.: Induction of Decision Trees, Machine Learning, Vol.1 (1986) 81–106
Quinlan, J.R.: C4.5: Programs for Machine Learning, Morgan Kaufmann, San Ma-teo, CA (1993)
Rissanen, J.: Modeling by Shortest Data Description, Automatica, Vol.14 (1978) 465–471
Russell, S.J. and Norvig, P.: Artificial Intelligence A Modern Approach, Prentice-Hall (1995)
Schaffer, C: Selecting a Classification Method by Cross-Validation, Machine Learning, Vol.13, No.l (1993) 135–143
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hanaoka, M., Kobayashi, M., Yamazaki, H. (2000). RWS (Random Walk Splitting): A Random Walk Based Discretization of Continuous Attributes. In: Mizoguchi, R., Slaney, J. (eds) PRICAI 2000 Topics in Artificial Intelligence. PRICAI 2000. Lecture Notes in Computer Science(), vol 1886. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44533-1_18
Download citation
DOI: https://doi.org/10.1007/3-540-44533-1_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67925-7
Online ISBN: 978-3-540-44533-3
eBook Packages: Springer Book Archive