Soft Computing

, Volume 21, Issue 24, pp 7363–7379 | Cite as

Evolutionary induction of a decision tree for large-scale data: a GPU-based approach

  • Krzysztof Jurczuk
  • Marcin Czajkowski
  • Marek Kretowski
Methodologies and Application

Abstract

Evolutionary induction of decision trees is an emerging alternative to greedy top-down approaches. Its growing popularity results from good prediction performance and less complex output trees. However, one of the major drawbacks associated with the application of evolutionary algorithms is the tree induction time, especially for large-scale data. In the paper, we design and implement a graphics processing unit (GPU)-based parallelization of evolutionary induction of decision trees. We apply a Compute Unified Device Architecture programming model, which supports general-purpose computation on a GPU (GPGPU). The selection and genetic operators are performed sequentially on a CPU, while the evaluation process for the individuals in the population is parallelized. The data-parallel approach is applied, and thus, the parts of a dataset are spread over GPU cores. Each core processes the assigned chunk of the data. Finally, the results from all GPU cores are merged and the sought tree metrics are sent to the CPU. Computational performance of the proposed approach is validated experimentally on artificial and real-life datasets. A comparison with the traditional CPU version shows that evolutionary induction of decision trees supported by GPGPU can be accelerated significantly (even up to 800 times) and allows for processing of much larger datasets.

Keywords

Evolutionary algorithms Decision trees Parallel computing Graphics processing unit (GPU) Large-scale data 

References

  1. Alba E, Tomassini M (2002) Parallelism and evolutionary algorithms. IEEE Trans Evol Comput 6(5):443–462CrossRefGoogle Scholar
  2. Anderson DT, Luke RH, Keller JM (2008) Speedup of fuzzy clustering through stream processing on graphics processing units. IEEE Trans Fuzzy Syst 16:1101–1106CrossRefGoogle Scholar
  3. Bacardit J, Llora X (2013) Large-scale data mining using genetics-based machine learning. WIREs Data Min Knowl Discov 3:37–61CrossRefGoogle Scholar
  4. Barros RC, Basgalupp MP, Carvalho AC, Freitas AA (2012) A survey of evolutionary algorithms for decision-tree induction. IEEE Trans SMC C 42(3):291–312Google Scholar
  5. Blake C, Keogh E, Merz C (1998) UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html
  6. Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth Int. Group, BelmontMATHGoogle Scholar
  7. Bull L, Studley M, Bagnall A, Whittley I (2007) Learning classifier system ensembles with rule-sharing. IEEE Trans Evol Comput 11:496–502CrossRefGoogle Scholar
  8. Cano A, Zafra A, Ventura S (2012) Speeding up the evaluation phase of GP classification algorithms on GPUs. Soft Comput 16:187–202CrossRefGoogle Scholar
  9. Cano A, Olmo JL, Ventura S (2013) Parallel multi-objective ant programming for classification using GPUs. J Parallel Distrib Comput 73:713–728CrossRefGoogle Scholar
  10. Cano A, Luna JM, Ventura S (2013) High performance evaluation of evolutionary-mined association rules on GPUs. J Supercomput 66(3):1438–1461CrossRefGoogle Scholar
  11. Cano A, Luna JM, Ventura S (2014) Parallel evaluation of Pittsburgh rule-based classifiers on GPUs. Neurocomputing 126:45–57CrossRefGoogle Scholar
  12. Cano A, Ventura S (2014) GPU-parallel subtree interpreter for genetic programming. In: Proceedings of GECCO’14, pp 887–894Google Scholar
  13. Cano A, Luna JM, Ventura S (2015) Speeding up multiple instance learning classification rules on GPUs. Knowl Inf Syst 44(1):127–145CrossRefGoogle Scholar
  14. Cantu-Paz E (2000) Efficient and accurate parallel genetic algorithms. Kluwer Academic, NorwellMATHGoogle Scholar
  15. Chitty DM (2012) Fast parallel genetic programming: multi-core CPU versus many-core GPU. Soft Comput 16:1795–1814CrossRefGoogle Scholar
  16. Chitty DM (2016) Improving the performance of GPU-based genetic programming through exploitation of on-chip memory. Soft Comput 20(2):661–680CrossRefGoogle Scholar
  17. Crepinsek M, Liu S, Mernik M (2013) Exploration and exploitation in evolutionary algorithms: a survey. ACM Comput Surv 45(3):35:1–35:33CrossRefMATHGoogle Scholar
  18. Czajkowski M, Kretowski M (2014) Evolutionary induction of global model trees with specialized operators and memetic extensions. Inf Sci 288:153–173CrossRefGoogle Scholar
  19. Czajkowski M, Czerwonka M, Kretowski M (2015) Cost-sensitive global model trees applied to loan charge-off forecasting. Decis Support Syst 74:55–66CrossRefMATHGoogle Scholar
  20. Czajkowski M, Jurczuk K, Kretowski M (2015) A parallel approach for evolutionary induced decision trees. MPI+OpenMP implementation. In: Proceedings of ICAISC’15. Lecture notes in computer science, vol 9119, pp 340–349Google Scholar
  21. Esposito F, Malerba D, Semeraro G (1997) A comparative analysis of methods for pruning decision trees. IEEE Trans Pattern Anal Mach Intell 19(5):476–491CrossRefGoogle Scholar
  22. Fabris F, Krohling RA (2012) A co-evolutionary differential evolution algorithm for solving min-max optimization problems implemented on GPU using C-CUDA. Expert Syst Appl 39(12):10324–10333CrossRefGoogle Scholar
  23. Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (1996) Advances in knowledge discovery and data mining. AAAI Press, Palo AltoGoogle Scholar
  24. Franco MA, Krasnogor N, Bacardit J (2010) Speeding up the evaluation of evolutionary learning systems using GPGPUs. In: Proceedings of GECCO 10. ACM, New York, pp 1039–1046Google Scholar
  25. Franco MA, Bacardit J (2016) Large-scale experimental evaluation of GPU strategies for evolutionary machine learning. Inf Sci 330:385–402CrossRefGoogle Scholar
  26. Freitas AA (2002) Data mining and knowledge discovery with evolutionary algorithms. Springer, SecaucusCrossRefMATHGoogle Scholar
  27. Grahn H, Lavesson N, Lapajne MH, Slat D (2011) CudaRF: a CUDA-based implementation of random forests. In: Proceedings of IEEE/ACS, pp 95–101Google Scholar
  28. Grama A, Karypis G, Kumar V, Gupta A (2003) Introduction to parallel computing. Addison-Wesley, ReadingMATHGoogle Scholar
  29. Grześ M, Kretowski M (2007) Decision tree approach to microarray data analysis. Biocybern Biomed Eng 27(3):29–42Google Scholar
  30. Hyafil L, Rivest RL (1976) Constructing optimal binary decision trees is NP-complete. Inf Process Lett 5(1):15–17CrossRefMATHGoogle Scholar
  31. Kass GV (1980) An exploratory technique for investigating large quantities of categorical data. Appl Stat 29(2):119–127CrossRefGoogle Scholar
  32. Kotsiantis SB (2013) Decision trees: a recent overview. Artif Intell Rev 39:261–283CrossRefGoogle Scholar
  33. Kretowski M (2004) An evolutionary algorithm for oblique decision tree induction. In: Proceedings of ICAISC’04. Lecture notes in computer science, vol 3070, pp 432–437Google Scholar
  34. Kretowski M, Grześ M (2005) Global learning of decision trees by an evolutionary algorithm. In: Saeed K, Pejaś J (eds) Information processing and security systems. Springer, US, pp 401–410. http://link.springer.com/chapter/10.1007%2F0-387-26325-X_36
  35. Kretowski M, Grześ M (2007) Evolutionary induction of mixed decision trees. Int J Data Wareh Min 3(4):68–82CrossRefGoogle Scholar
  36. Langdon WB (2011) Graphics processing units and genetic programming: an overview. Soft Comput 15:1657–1699CrossRefGoogle Scholar
  37. Langdon WB (2013) Large-scale bioinformatics data mining with parallel genetic programming on graphics processing units. In: Tsutsui S, Collet P (eds) Massively parallel evolutionary computation on GPGPUs, Springer, Berlin, Heidelberg, pp 311–347Google Scholar
  38. Llora X (2002) Genetics-based machine learning using fine-grained parallelism for data mining. Ph.D. Thesis. Barcelona, Ramon Llull UniversityGoogle Scholar
  39. Lo WT, Chang YS, Sheu RK, Chiu CC, Yuan SM (2014) CUDT: a CUDA based decision tree algorithm. Sci World J 1–12. http://www.hindawi.com/journals/tswj/2014/745640/
  40. Loh W (2014) Fifty years of classification and regression trees. Int Stat Rev 83(3):329–348CrossRefGoogle Scholar
  41. Luong TV, Melab N, Talbi E (2010) GPU-based island model for evolutionary algorithms. In: Proceedings of GECCO ’10. ACM, New York, pp 1089–1096Google Scholar
  42. Maitre O, Kruger F, Querry S, Lachiche N, Collet P (2012) EASEA: specification and execution of evolutionary algorithms on GPGPU. Soft Comput 16:261–279CrossRefGoogle Scholar
  43. Marron D, Bifet A, Morales GF (2014) Random forests of very fast decision trees on GPU for mining evolving big data streams. In: Proceedings of ECAI, pp 615–620Google Scholar
  44. Michalewicz Z (1996) Genetic algorithms \(+\) data structures \(=\) evolution programs, 3rd edn. Springer, BerlinCrossRefMATHGoogle Scholar
  45. Nasridonov A, Lee Y, Park YH (2014) Decision tree construction on GPU: ubiquitous parallel computing approach. Computing 96(5):403–413CrossRefGoogle Scholar
  46. NVIDIA (2015) CUDA C programming guide. Technical report. https://docs.nvidia.com/cuda/cuda-c-programming-guide/
  47. NVIDIA (2015) CUDA C best practices guide in CUDA toolkit. Technical report. https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/
  48. Oh KS, Jung K (2014) GPU implementation of neural networks. Pattern Recogn 37(6):1311–1314CrossRefMATHGoogle Scholar
  49. Oiso M, Matsumura Y, Yasuda T, Ohkura K (2011) Implementing genetic algorithms to CUDA environment using data parallelization. Tech Gaz 18(4):511–517Google Scholar
  50. Quinlan JR (1992) Learning with continuous classes. In: Proceedings of AI’92, World Scientific, pp 343–348Google Scholar
  51. Rokach L, Maimon OZ (2005) Top–down induction of decision trees classifiers—a survey. IEEE Trans SMC C 35(4):476–487Google Scholar
  52. Rokach L, Maimon OZ (2008) Data mining with decision trees: theory and application. Mach Percept Artif Intell 69. http://www.worldscientific.com/worldscibooks/10.1142/6604
  53. Soca N, Blengio JL, Pedemonte M, Ezzatti P (2010) PUGACE, a cellular evolutionary algorithm framework on GPUs. In: Proceedings of IEEE congress on evolutionary computation (CEC), pp 1–8Google Scholar
  54. Strnad D, Nerat A (2016) Parallel construction of classification trees on a GPU. Concurr Comput Pract Exp 28(5):1417–1436CrossRefGoogle Scholar
  55. Tsutsui S, Collet P (2013) Massively parallel evolutionary computation on GPGPUs. Springer, BerlinCrossRefGoogle Scholar
  56. Veronese L, Krohling R (2010) Differential evolution algorithm on the GPU with C-CUDA: In: Proceedings of IEEE congress on evolutionary computation (CEC), pp 1–7Google Scholar
  57. Wilt N (2013) Cuda handbook: a comprehensive guide to GPU programming. Addison-Wesley, ReadingGoogle Scholar
  58. Woodward JR (2003) GA or GP? That is not the question. In: Proceedings of IEEE CEC, pp 1056–1063Google Scholar
  59. Yuen D, Wang L, Chi X, Johnsson L, Ge W (2013) GPU solutions to multi-scale problems in science and engineering. Springer, BerlinCrossRefGoogle Scholar
  60. Zhu W (2011) Nonlinear optimization with a massively parallel evolution strategy–pattern search algorithm on graphics hardware. Appl Soft Comput 11:1770–1781CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Krzysztof Jurczuk
    • 1
  • Marcin Czajkowski
    • 1
  • Marek Kretowski
    • 1
  1. 1.Faculty of Computer ScienceBialystok University of TechnologyBiałystokPoland

Personalised recommendations