Fuzziness and Performance: An Empirical Study with Linguistic Decision Trees
Generally, there are two main streams of theories for studying uncertainties. One is probability theory and the other is fuzzy set theory. One of the basic ideas of fuzzy set theory is how to define and interpret membership functions. In this paper, we will study tree-structured data mining model based on a new interpretation of fuzzy theory. In this new theory, fuzzy labels will be used for modelling. The membership function is interpreted as appropriateness degrees for using labels to describe a fuzzy concept. Each fuzzy concept is modelled by a distribution on the appropriate fuzzy label sets. Previous work has shown that the new model outperforms some well-known data mining models such as Naive Bayes and Decision trees. However, the fuzzy labels used in previous works were predefined. We are interested in study the influences on the performance by using fuzzy labels with different degrees of overlapping. We test a series of UCI datasets and the results show that the performance of the model increased almost monotonically with the increase of the overlapping between fuzzy labels. For this empirical study with the LDT model, we can conclude that more fuzziness implies better performance.
KeywordsInformation Gain Fuzzy Concept Focal Element Classical Decision Tree Label Semantic
Unable to display preview. Download preview PDF.
- 1.Blake, C., Merz, C.J.: UCI machine learning repository, http://www.ics.uci.edu/~mlearn/MLRepository.html
- 5.Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)Google Scholar
- 6.Quinlan, J.R.: Decision trees at probabilistic classifiers. In: Proceeding of 4th International Workshop on Machine Learning, pp. 31–37. Morgan Kaufmann, San Francisco (1987)Google Scholar
- 7.Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar