Abstract
Classification, which is the task of assigning objects to one of several predefined categories, is a pervasive problem that encompasses many diverse applications. Decision tree classifier, which is a simple yet widely used classification technique, employs training data to yield decision rules; moreover, it can create thresholds and then split the list of continuous attributes into descrete intervals for handling continuous attributes (Quinlan in Journal of Artificial Intelligence Research 4:77–90, 1996). Rough set theory (Pawlak in International Journal of Computer and Information Sciences 11:341–356, 1982; International Journal of Man-Machine Studies 20:469–483, 1984; Rough sets: theoretical aspects of reasoning about data. Kluwer, Dordrecht, 1991) has been applied to a wide variety of decision analysis problems for the extraction of rules from databases. This paper proposes a hybrid approach that takes advantage of combining decision tree and rough sets classifier and applies it to plant classification. The introduced approach starts with decision tree classifier (C4.5) as preprocessing technique to make interval-discretization, subsequently, and uses rough set method for extracting rules. The proposed approach aims at finding out classification rules via analyzing lamina attributes (leaf stalk, leaf width, leaf length, length/width ratio) of Cinnamomum, which are gathered and measured by plant specialists in the field of Taiwan. A comparison with the widely used algorithms (e.g., decision tree, multilayer perceptrons, naïve Bayes, and rough sets classifier) is carried out to show numerous advantages of the proposed approach. Finally, employing with test data in which species are unknown, results of classification are approved by consulting the relative plant specialists.
Similar content being viewed by others
References
Breiman L, Friedman JH, Olshen R, Stone CJ (1984) Classification and regression trees. Chapman and Hall, New York
Catlett J (1991) On changing continuous attributes into ordered discrete attributes, European Working Session on Learning
Dimitras AL, Slowinski R, Susmaga R, Zopounidis C (1999) Business failure prediction using rough set. European Journal of Operation Research 114:263–280
Duda RO, Hart PE (1973) Pattern classification and scene analysis. (Q327.D83) Wiley, New York, p 218. ISBN 0–471-22361-1
Fayyad UM, Irani KB (1992) Technical note on the handling in decision tree of continuous-valued attributes generation. Machine Learning 8:87–102
Grzymala-Busse JW (1997) A new version of the rule induction system LERS. Fundamenta Informaticae 31:27–39
Han J, Kamber M (2006) Data mining: concepts and techniques. Morgan Kaufmann, San Francisco
Judd WS, Campbell CS, Kellogg EA, Stevens PF (1999) Plant systematics: a phylogenetic approach. Sinauer Associates, Sunderland, MA
Kass GV (1980) An exploratory technique for investigating large quantities of categorical data. Applied Statistics 29:119–127
Kerber R (1992) ChiMerge: discretization of numeric attributes, Proceedings AAAI–92, ninth international conference artificial intelligence, pp 123–128
Khuroo AA, Dar GH, Khan ZS, Malik AH (2007) Exploring an inherent interface between taxonomy and biodiversity: current problems and future challenges. Journal for Nature Conservation 15:256–261
Lawrence GHM (1955) Taxonomy of vascular plants. Prentice Hall College Div, New York
Mitra S, Acharya T (2003) Data mining: multimedia. soft computing, and bioinformatics. Wiley, New Jersey
Murthy SK (1998) Automatic construction of decision trees from data: a multi-disciplinary survey. Data Mining and Knowledge Discovery 2(4):345–389
Pawlak Z (1982) Rough sets. International Journal of Computer and Information Sciences 11:341–356
Pawlak Z (1984) Rough classification. International Journal of Man-Machine Studies 20:469–483
Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer, Dordrecht
Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Francisco
Predki B, Slowinski R, Stefanowski J, Susmaga R, Wilk Sz (1998) ROSE-software implementation of the rough set theory. In: Polkowski L, Skowron A (eds) “Rough Sets and Current Trends in Computing”, lecture notes in artificial intelligence, vol vol 1424. Springer-Verlag, Berlin, pp 605–608
Quinlan JR (1986) Induction of decision trees. Machine Learning 1:81–106
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo
Quinlan JR (1996) Improved use of continuous attributes in c4.5. Journal of Artificial Intelligence Research 4:77–90
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representation by error propagation. Parallel Distributed Processing 1:318–362
Safavian SR, Landgrebe D (1998) A survey of decision tree classifier methodology. IEEE Trans. Systems, Man and Cybernetics 22:660–674, May/June
Samuel BJ, Luchsinger AE (1979) Plant systematics. McGraw-Hill, Columbus
Skrypnyk I (2002) Comparison of feature selection strategies for hearing impairments diagnostics, proceedings of the 15th IEEE symposium on computer-based medical systems (CBMS 2002)
Tan P, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson Education, New York
Yang Y, Liu H, Lu S (1999) Introduction to vascular plants in Taiwan. Council of Agriculture, Taipei, Taiwan
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cheng, CH., Chen, YH. & Liu, JW. Classifying Cinnamomums using rough sets classifier based on interval-discretization. Plant Syst Evol 280, 89–97 (2009). https://doi.org/10.1007/s00606-009-0161-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00606-009-0161-0