Advertisement

Soft Computing

, Volume 15, Issue 6, pp 1129–1136 | Cite as

Case study of inaccuracies in the granulation of decision trees

  • Salman Badr
  • Andrzej Bargiela
Focus

Abstract

Cybernetics studies information process in the context of interaction with physical systems. Because such information is sometimes vague and exhibits complex interactions; it can only be discerned using approximate representations. Machine learning provides solutions that create approximate models of information and decision trees are one of its main components. However, decision trees are susceptible to information overload and can get overly complex when a large amount of data is inputted in them. Granulation of decision tree remedies this problem by providing the essential structure of the decision tree, which can decrease its utility. To evaluate the relationship that exists between granulation and decision tree complexity, data uncertainty and prediction accuracy, the deficiencies obtained by nursing homes during annual inspections were taken as a case study. Using rough sets, three forms of granulation were performed: (1) attribute grouping, (2) removing insignificant attributes and (3) removing uncertain records. Attribute grouping significantly reduces tree complexity without having any strong effect upon data consistency and accuracy. On the other hand, removing insignificant features decrease data consistency and tree complexity, while increasing the error in prediction. Finally, decrease in the uncertainty of the dataset results in an increase in accuracy and has no impact on tree complexity.

Keywords

Rough set-based decision trees Granulation Accuracy Complexity Uncertainty Attribute reduction 

References

  1. Bargiela A, Pedrycz W (2003) Granular computing: an introduction. Kluwer Academic Publishers, DordrechtMATHGoogle Scholar
  2. Cherkauer KJ, Shavlik JW (1996) Growing simpler decision trees to facilitate knowledge discovery. In: Proceedings of the second international conference on knowledge discovery and data mining, pp 315–318Google Scholar
  3. Fierens D, Ramon J, Blockeel H, Bruynooghe M (2005) A comparison of approaches for learning first-order logical probability estimation trees. LNCS 3720:556–563Google Scholar
  4. Hall LO, Chawla N, Bowyer KW (1998) Decision tree learning on very large data sets. IEEE Int Conf Syst Man Cybern 3:2579–2584Google Scholar
  5. Han SW, Kim JY (2008) A new decision tree algorithm based on rough set theory. Int J Innov Comput Inf Control 4:2749–5757MathSciNetGoogle Scholar
  6. Huang L, Huang M, Guo B, Zhang Z (2007) A new method for constructing decision tree based on rough set theory. IEEE Int Conf Granular Comput 241–244Google Scholar
  7. John M (1989) An empirical comparison of pruning methods for decision tree induction. Mach Learn 4:227–243CrossRefGoogle Scholar
  8. Kweku-Muata O-B (2007) Post-pruning in decision tree induction using multiple performance measures. Comput Oper Res 34:3331–3345MATHCrossRefGoogle Scholar
  9. Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Publishers, DordrechtMATHGoogle Scholar
  10. Refaat M (2007) Data Preparation for Data Mining Using SAS, Morgan KaufmannGoogle Scholar
  11. Tusar T (2007) Optimizing accuracy and size of decision trees. In: Proceedings of the sixteenth international electronical and computer science conference-ERK 2007, pp 81–84Google Scholar
  12. Wang C, Ou F (2008) An algorithm for decision tree construction based on rough set theory. In: International conference on computer science and information technology, pp 295–298Google Scholar
  13. Wittien IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann Publishers, CaliforniaGoogle Scholar
  14. Yellasiri R, Rao CR, Reddy V (2005) Decision tree induction using rough set theory-comparative study. J Theor Appl Inf Technol 3:110–114Google Scholar
  15. Zhou X, Zhang D, Jiang Y (2008) A new credit scoring method based on rough sets and decision tree. LNCS 5012:1081–1089Google Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  1. 1.School of Computer Science, Faculty of ScienceUniversity of NottinghamSemenyihMalaysia

Personalised recommendations