Abstract
Machine learning (ML) algorithms have been capable of processing symbolic, categorial data only. Real-world problems, particularly in medicine, comprise not only symbolic, but also numerical attributes. There are several approaches to discretize (categorize) numerical attributes. This article describes two newer algorithms for such a discretization.
The first one has been designed and implemented in KEX (Knowledge Explorer) as its preprocessing procedure. The other discretization procedure was designed for the CN4 algorithm, a large extension of the well-known CN2. The discretization procedure in CN4 works on-line, i.e., it dynamically (within the induction) discretizes numerical attributes.
A large drawback of these discretization procedures, either off-line or on-line, is that they generate sharp bounds between intervals. One way how to eliminate an impurity around the interval borders is to fuzzify them. Here we introduce the newest empirical procedures for fuzzification, both off-line (within KEX) and on-line (CN4).
This chapter first surveys the methodology of empirical machine learning (Section 1), then attribute-based rile-inducing learning from examples (Section 2). Section 3 briefly introduces the KEX algorithm and Section 4 surveys CN4. The last Section focuses on discretization and fuzzification procedures, includes empirical results that compare performance of KEX, CN4, and other well-known machine learning algorithms as for discretization and fuzzification, and concludes with analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
P. Berka, I. Bruha (1995): Various discretizing procedures of numerical attributes: Empirical comparisons. European Conf on Machine Learning, Workshop Statistics, Machine Learning, and Knowledge Discovery in Databases, Heraklion, Crete, 136–141.
P. Berka, J. Ivanek (1994): Automated knowledge acquisition for PROSPECTOR-like expert systems. ECML-94, Springer-Verlag, 339–342
P.B. Brazdil, I. Bruha (1992): A note on processing missing attribute values: a modified technique. Workshop on Machine learning, Canadian Conf AI, Vancouver.
D. Biggs, B. de Ville, E. Suen (1991): A method of choosing multiway partitions for classification and decision trees. J. Applied Statistics, 18, 1, 49–62.
I. Bruha, S. Kockova (1993): Quality of decision rules: empirical and statistical approaches. Informatica, 17, 233–243.
I. Bruha, S Kockova (1993): A covering learning algorithm for cost-sensitive and noisy environments. European Conf. Machine Learning, Workshop on Learning Robots, Vienna.
I. Bruha (1996): Quality of decision rules. Definitions and classification schemes for multiple rules. In: G Nakhaeizadeh, C.C. Taylor (eds.): Machine Learning and Statistics: The Interface. John Wiley, 107–131.
I. Bruha, S. Kockova (1994): A support for decision making: Cost-sensitive learning system. Artificial Intelligence in Medicine, 6, 67–82.
J. Catlett (1991): On changing continuous attributes into ordered discrete attributes. EWSL-91, Porto, Springer-Verlag, 164–178.
B. Cestnik, I. Kononenko, I. Bratko (1988): Assistant 86: A knowledge-elicitation tool for sophisticated users. In: I. Bratko, N. Lavrac (eds.): Progress in machine learning. Proc. EWSL’88, Sigma Press, 31–46.
P. Clark, R Boswell (1991): Rule Induction with CN2: Some Recent Improvements. EWSL-91, Porto.
J.G Carbonell, RS. Michalski, TM Mitchell (1983): An overview of machine learning. In [19]
P. Clark, T Niblett (1989): The CN2 induction algorithm. Machine Learning, 3, 261–283
RO. Dada, J.E. Gasching (1979): Model Design in the PROSPECTOR Consultant System for Mineral Exploration. In: Michie, D. (ed.), Expert Systems in the Micro Electronic Age, Edinburgh University Press, UK.
P. Hajek (1985): Combining Functions for Certainty Factors in Consulting Systems. Int.J. Man- Machine Studies, 22, 59–76.
C. Lee, D. Shin (1994): A context-sensitive discretization of numeric attributes for classification learning. ECAI-94, Amsterdam, John Wiley, 428–432.
RS. Michalski (1980): Pattern recognition as rule-guided inductive inference. IEEE Trans. PAMI-2, 4, 349–361.
RS. Michalski et al. (1986): The multi purpose incremental learning system AQ15 and its testing application to three medical domains. Proc. 5th AAAI, 10415.
RS. Michalski, J.G Carbonell, T.M. Mitchell (eds.) (1983): Machine learning: An artificial intelligence approach, I. Tioga Publ.
M. Nunez (1988): Economic induction: a case study. EWSL’88, Glasgow, 139145.
J.R Quinlan (1986): Induction of decision trees. Machine Learning, 1, 81–106.
J.R Quinlan (1987): Simplifying decision trees. Interni. J. on Man machine Studies, 27, 221–234.
J.R Quinlan (1989)• Unknown attribute values in ID3. Intrn’l Conf ML,164–168.
J.R Quinlan (1994): C4.5: Programs for machine learning. Morgan Kaufinann Publ.
H.A. Simon (1983): Why should machines learn? In [19].
M. Tan, J.C. Schlimmer (1990): Two case studies in cost-sensitive concept acquisition. 8th Conf. AI.
J. Zeidler, M. Schlosser (1995): Fuzzy handling of continuous-valued attributes in decision trees. 8th European Conf. on Machine Learning, Workshop Statistics, Machine Learning, and Knowledge Discovery in Databases, Heraklion, Crete, 41–46.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Bruha, I., Berka, P. (2000). Discretization and Fuzzification of Numerical Attributes in Attribute-Based Learning. In: Szczepaniak, P.S., Lisboa, P.J.G., Kacprzyk, J. (eds) Fuzzy Systems in Medicine. Studies in Fuzziness and Soft Computing, vol 41. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1859-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-7908-1859-8_6
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-662-00395-4
Online ISBN: 978-3-7908-1859-8
eBook Packages: Springer Book Archive