Advertisement

Using Reliable Short Rules to Avoid Unnecessary Tests in Decision Trees

  • Hyontai Sug
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4293)

Abstract

It is known that in decision trees the reliability of lower branches is worse than the upper branches due to data fragmentation problem. As a result, unnecessary tests of attributes may be done, because decision trees may require tests that are not best for some part of the data objects. To supplement the weak point of decision trees of data fragmentation, using reliable short rules with decision tree is suggested, where the short rules come from limited application of association rule finding algorithms. Experiment shows the method can not only generate more reliable decisions but also save test costs by using the short rules.

Keywords

Decision Tree Association Rule Decision Tree Algorithm Unnecessary Test Association Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  2. 2.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group, Inc. (1984)Google Scholar
  3. 3.
    StatSoft, Inc.: Electronic Statistics Textbook. Tulsa, OK, StatSoft (2004), WEB: http://www.statsoft.com/textbook/stathome.html
  4. 4.
    Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A Fast Scalable Classifier for Data Mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, Springer, Heidelberg (1996)CrossRefGoogle Scholar
  5. 5.
    Shafer, J., Agrawal, R., Mehta, M.: SPRINT: A Scalable Parallel Classifier for Data Mining. In: Proc. 1996 Int. Conf. Very Large Data Bases, Bombay, India, September 1996, pp. 544–555 (1996)Google Scholar
  6. 6.
    Rastogi, R., Shim, K.: PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning. Data Mining and Knowledge Discovery 4(4), 315–344 (2002)CrossRefGoogle Scholar
  7. 7.
    Gehrke, J., Ramakrishnan, R., Ganti, V.: Rainforest: A Framework for Fast Decision Tree Construction of Large Datasets. In: Proc. 1998 Int. Conf. Very Large Data Bases, New York, August 1998, pp. 416–427 (1998)Google Scholar
  8. 8.
    Catlett, J.: Megainduction: Machine Learning on Very Large Databases. PhD thesis, University of Sydney, Australia (1991)Google Scholar
  9. 9.
    SAS: Decision Tree Modeling Course Notes. SAS Publishing (2002)Google Scholar
  10. 10.
    Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer, Heidelberg (2002)MATHGoogle Scholar
  11. 11.
    Almuallim, H., Dietterich, T.G.: Efficient Algorithms for Identifying Relevant Features. In: Proc. of the 9th Canadian Conference on Artificial Intelligence, pp. 38–45 (1992)Google Scholar
  12. 12.
    Kononenko, I., et al.: Overcoming the Myopia of Inductive Learning Algorithms with RELIEF. Applied Intelligence 7(1), 39–55 (1997)CrossRefGoogle Scholar
  13. 13.
    Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer International (1998)Google Scholar
  14. 14.
    Liu, B., Hsu, W., Ma, Y.: Integrating Classification and Association Rule Mining. In: Proc. of the 4th International Conference on Knowledge Discovery and Data Mining (KDD 1998), New York, pp. 80–86 (1998)Google Scholar
  15. 15.
    Liu, B., Hu, M., Hsu, W.: Multi-level Organization and Summarization of the Discovered Rule. In: Proc. of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, pp. 208–217 (2000)Google Scholar
  16. 16.
    Wang, K., Zhou, S., He, Y.: Growing Decision Trees on Support-less Association Rules. In: Proc. of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, pp. 265–269 (2000)Google Scholar
  17. 17.
    Berzal, F., Cubero, J., Sanchez, D., Serrano, J.M.: ART: A Hybrid Classification Model. Machine Learning 54, 67–92 (2004)MATHCrossRefGoogle Scholar
  18. 18.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)Google Scholar
  19. 19.
    Li, W., Han, J., Pei, J.: CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules. In: Proceedings 2001 Int. Conf. on Data Mining (ICDM 2001), San Jose, CA (2001)Google Scholar
  20. 20.
    Liu, B., Hsu, W., Ma, Y.: Integrating Classification and Association Rule Mining. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD 1998), New York (1998)Google Scholar
  21. 21.
    Hettich, S., Bay, S.D.: The UCI KDD Archive. University of California, Department of Information and Computer Science, Irvine, CA (1999), http://kdd.ics.uci.edu
  22. 22.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar
  23. 23.
    Agrawal, R., Mannila, H., Toivonen, H., Verkamo, A.I.: Fast Discovery of Association Rules. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smith, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI Press/The MIT Press (1996)Google Scholar
  24. 24.
    Pak, J.S., Chen, M., Yu, P.S.: Using a Hash-Based Method with Transaction Trimming for Mining Association Rules. IEEE Transactions on Knowledge and Data Engineering 9(5), 813–825 (1997)CrossRefGoogle Scholar
  25. 25.
    Toivonen, H.: Discovery of Frequent Patterns in Large Data Collections. Phd thesis, Department of Computer Science, University of Helsinki, Finland (1996)Google Scholar
  26. 26.
    Savasere, A., Omiecinski, E., Navathe, S.: An Efficient Algorithm for Mining Association Rules in Large Databases. College of Computing, Georgia Institute of Technology, Technical Report No.: GIT–CC–95–04Google Scholar
  27. 27.
    Cochran, W.G.: Sampling Techniques. Wiley, Chichester (1977)MATHGoogle Scholar
  28. 28.
    Aggarawal, C.C., Yu, P.S.: A New Frame Work for Itemset Generation. In: PODS 1998, pp. 18–24 (1998)Google Scholar
  29. 29.
    Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: An Enabling Techniques. Data Mining and Knowledge Discovery 6(4), 393–423 (2002)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Hyontai Sug
    • 1
  1. 1.Division of Computer and Information EngineeringDongseo UniversityBusanSouth Korea

Personalised recommendations