Skip to main content

On Improving the Prediction Accuracy of a Decision Tree Using Genetic Algorithm

  • Conference paper
  • First Online:
Advanced Data Mining and Applications (ADMA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11323))

Included in the following conference series:

Abstract

Decision trees are one of the most popular classifiers used in a wide range of real-world problems. Thus, it is very important to achieve higher prediction accuracy for decision trees. Most of the well-known decision tree induction algorithms used in practice are based on greedy approaches and hence do not consider conditional dependencies among the attributes. As a result, they may generate suboptimal solutions. In literature, often genetic programming-based (a complex variant of genetic algorithm) decision tree induction algorithms have been proposed to eliminate some of the problems of greedy approaches. However, none of the algorithms proposed so far can effectively address conditional dependencies among the attributes. In this paper, we propose a new, easy-to-implement genetic algorithm-based decision tree induction technique which is more likely to ascertain conditional dependencies among the attributes. An elaborate experimentation is conducted on thirty well known data sets from the UCI Machine Learning Repository in order to validate the effectiveness of the proposed technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abellan, J.: Ensembles of decision trees based on imprecise probabilities and uncertainty measures. Inf. Fusion 14, 423–430 (2013)

    Article  Google Scholar 

  2. Adnan, M.N., Islam, M.Z.: ComboSplit: combining various splitting criteria for building a single decision tree. In: Proceedings of the International Conference on Artificial Intelligence and Pattern Recognition, pp. 1–8 (2014)

    Google Scholar 

  3. Adnan, M.N., Islam, M.Z.: Forest CERN: a new decision forest building technique. In: Proceedings of the 20th Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 304–315 (2016)

    Chapter  Google Scholar 

  4. Adnan, M.N., Islam, M.Z.: Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm. Knowl.-Based Syst. 110, 86–97 (2016)

    Article  Google Scholar 

  5. Adnan, M.N., Islam, M.Z., Kwan, P.W.H.: Extended space decision tree. In: Wang, X., Pedrycz, W., Chan, P., He, Q. (eds.) ICMLC 2014. CCIS, vol. 481, pp. 219–230. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-45652-1_23

    Chapter  Google Scholar 

  6. Aitkenhead, M.J.: A co-evolving decision tree classification method. Expert Syst. Appl. 34(1), 18–25 (2008)

    Article  Google Scholar 

  7. Arlot, S.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)

    Article  MathSciNet  Google Scholar 

  8. Barros, R.C., Basgalupp, M.P., de Carvalho, A.C.P.L.F., Freitas, A.A.: A survey of evolutionary algorithm for decision tree induction. IEEE Trans. Syst. Man Cybern. - Part C: Appl. Rev. 42(3), 291–312 (2012)

    Article  Google Scholar 

  9. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2008)

    MATH  Google Scholar 

  10. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth International Group, Belmont (1985)

    MATH  Google Scholar 

  11. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2, 121–167 (1998)

    Article  Google Scholar 

  12. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  13. Espejo, P.G., Sebastian, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. - Part C: Appl. Rev. 40(2), 121–144 (2010)

    Article  Google Scholar 

  14. Fu, Z., Golden, B., Lele, S., Raghavan, S., Wasli, E.: Genetically engineered decision trees: population diversity produces smarter trees. Oper. Res. 51(6), 894–907 (2003)

    Article  MathSciNet  Google Scholar 

  15. Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2006)

    MATH  Google Scholar 

  16. Holland, J.H.: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. MIT Press, Cambridge (1992)

    Google Scholar 

  17. Hunt, E., Marin, J., Stone, P.: Experiments in Induction. Academic Press, New York (1966)

    Google Scholar 

  18. Kamber, M., Winstone, L., Gong, W., Cheng, S., Han, J.: Generalization and decision tree induction: efficient classification in data mining. In: Proceedings of the International Workshop Research Issues on Data Engineering, pp. 111–120 (1997)

    Google Scholar 

  19. Kataria, A., Singh, M.D.: A review of data classification using k-nearest neighbour algorithm. Int. J. Emerg. Technol. Adv. Eng. 3(6), 354–360 (2013)

    Google Scholar 

  20. Kim, Y.W., Oh, I.S.: Classifier ensemble selection using hybrid genetic algorithms. Pattern Recogn. Lett. 29, 796–802 (2008)

    Article  Google Scholar 

  21. Kurgan, L.A., Cios, K.J.: Caim discretization algorithm. IEEE Trans. Knowl. Data Eng. 16, 145–153 (2004)

    Article  Google Scholar 

  22. Li, J., Liu, H.: Ensembles of cascading trees. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 585–588 (2003)

    Google Scholar 

  23. Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml/datasets.html. Accessed 15 Mar 2016

  24. Lim, T.S., Loh, W.Y., Shih, Y.S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. 40, 203–229 (2000)

    Article  Google Scholar 

  25. Liu, Y., Shen, Y., Wu, X.: Automatic clustering using genetic algorithms. Appl. Math. Comput. 218, 1267–1279 (2011)

    MathSciNet  MATH  Google Scholar 

  26. Mason, R., Lind, D., Marchal, W.: Statistics: An Introduction. Brooks/Cole Publishing Company, New York (1998)

    Google Scholar 

  27. Murthy, S.K.: On growing better decision trees from data. Ph.D. thesis, The Johns Hopkins University, Baltimore, Maryland (1997)

    Google Scholar 

  28. Murthy, S.K.: Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min. Knowl. Discov. 2, 345–389 (1998)

    Article  Google Scholar 

  29. Murthy, S.K., Kasif, S., Salzberg, S.S.: A system for induction of oblique decision trees. J. Artif. Intell. Res. 2, 1–32 (1994)

    Article  Google Scholar 

  30. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)

    Google Scholar 

  31. Quinlan, J.R.: Improved use of continuous attributes in C4.5. J. Artif. Intell. Res. 4, 77–90 (1996)

    Article  Google Scholar 

  32. Rahman, M.A., Islam, M.Z.: A hybrid clustering technique combining a novel genetic algorithm with k-means. Knowl.-Based Syst. 71, 345–365 (2014)

    Article  Google Scholar 

  33. Shirasaka, M., Zhao, Q., Hammami, O., Kuroda, K., Saito, K.: Automatic design of binary decision trees based on genetic programming. In: Second Asia-Pacific Conference on Simulated Evolution and Learning. Australian Defense Force Academy, Canberra (1998)

    Google Scholar 

  34. Tamon, C., Xiang, J.: On the boosting pruning problem. In: López de Mántaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 404–412. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45164-1_41

    Chapter  Google Scholar 

  35. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, London (2006)

    Google Scholar 

  36. Tanigawa, T., Zhao, Q.: A study on efficient generation of decision trees using genetic programming. In: Genetic and Evolutionary Computation Conference (GECCO’2000), pp. 1047–1052. Morgan Kaufmann (2000)

    Google Scholar 

  37. Triola, M.F.: Elementary Statistics. Addison Wesley Longman Inc., Reading (2001)

    MATH  Google Scholar 

  38. Whitley, D.: A genetic algorithm tutorial. Stat. Comput. 4, 65–85 (1994)

    Article  Google Scholar 

  39. Wilcoxon, F.: Individual comparison by ranking methods. Biometrics 1, 80–83 (1945)

    Article  MathSciNet  Google Scholar 

  40. Zhang, G.P.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybern. 30, 451–462 (2000)

    Article  Google Scholar 

  41. Zhao, H.: A multi-objective genetic programming programming approach to developing pareto optimal decision trees. Decis. Support Syst. 43(3), 809–826 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md. Nasim Adnan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Adnan, M.N., Islam, M.Z., Akbar, M.M. (2018). On Improving the Prediction Accuracy of a Decision Tree Using Genetic Algorithm. In: Gan, G., Li, B., Li, X., Wang, S. (eds) Advanced Data Mining and Applications. ADMA 2018. Lecture Notes in Computer Science(), vol 11323. Springer, Cham. https://doi.org/10.1007/978-3-030-05090-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05090-0_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05089-4

  • Online ISBN: 978-3-030-05090-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics