Skip to main content
Log in

Scalable CAIM discretization on multiple GPUs using concurrent kernels

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Class-attribute interdependence maximization (CAIM) is one of the state-of-the-art algorithms for discretizing data for which classes are known. However, it may take a long time when run on high-dimensional large-scale data, with large number of attributes and/or instances. This paper presents a solution to this problem by introducing a graphic processing unit (GPU)-based implementation of the CAIM algorithm that significantly speeds up the discretization process on big complex data sets. The GPU-based implementation is scalable to multiple GPU devices and enables the use of concurrent kernels execution capabilities of modern GPUs. The CAIM GPU-based model is evaluated and compared with the original CAIM using single and multi-threaded parallel configurations on 40 data sets with different characteristics. The results show great speedup, up to 139 times faster using four GPUs, which makes discretization of big data efficient and manageable. For example, discretization time of one big data set is reduced from 2 h to \(<\)2 min.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. The data sets description, the algorithm’s source code, the experimental settings and results are fully described and publicly available to facilitate the replicability of the experiments and future comparisons at the website: http://www.uco.es/grupos/kdis/kdiswiki/CAIM-GPU.

References

  1. Akl SG (1990) Parallel sorting algorithms., Notes and reports in computer science and applied mathematicsAcademic Press, Orlando

    Google Scholar 

  2. Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Analysis framework. J Mult Valued Logic Soft Comput 17:255–287

    Google Scholar 

  3. Angiulli F, Pizzuti C (2005) Outlier mining in large high-dimensional data sets. IEEE Trans Knowl Data Eng 17(2):203–215

    Article  MathSciNet  Google Scholar 

  4. Bentley JL, McIlroy MD (1993) Engineering a sort function. Softw Pract Exp 23(11):1249–1265

    Article  Google Scholar 

  5. Bernaschi M, Bisson M, Fatica M, Phillips E (2012) An introduction to multi-GPU programming for physicists. Eur Phys J Special Top 210:17–31

    Article  Google Scholar 

  6. Brodtkorb AR, Hagen TR, Stra ML (2013) Graphics processing unit (GPU) programming strategies and trends in GPU computing. J Parallel Distrib Comput 73(1):4–13

    Article  Google Scholar 

  7. Cannataro M, Talia D, Srimani P (2002) Parallel data intensive computing in scientific and commercial applications. Parallel Comput 28(5):673–704

    Article  Google Scholar 

  8. Cano A, Luna JM, Ventura S (2013) High performance evaluation of evolutionary-mined association rules on GPUs. J Supercomput 66(3):1438–1461

    Article  Google Scholar 

  9. Cano A, Zafra A, Ventura S (2012) Speeding up the evaluation phase of GP classification algorithms on GPUs. Soft Comput 16(2):187–202

    Article  Google Scholar 

  10. Cederman D, Tsigas P (2010) GPU-quicksort: a practical quicksort algorithm for graphics processors. J Exp Algorithm 14:4–24

    Google Scholar 

  11. Cerquides J, Mantaras RLD (1997) Proposal and empirical comparison of a parallelizable distance-based discretization method. In: Proceedings of the international conference on knowledge discovery and data mining, pp 139–142

  12. Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Skadron K (2008) A performance study of general-purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380

    Article  Google Scholar 

  13. Cios KJ, Pedrycz W, Swiniarski RW, Kurgan LA (2007) Data mining: a knowledge discovery approach. Springer

  14. Cormen TH, Stein C, Rivest RL, Leiserson CE (2001) Introduction to Algorithms. In: 2nd edn. McGraw-Hill

  15. Davidson A, Tarjan D, Garland M, Owens JD (2012) Efficient parallel merge sort for fixed and variable length keys. In: Proceedings of international conference on innovative parallel computing, pp 1–9

  16. Frank A, Asuncion A (2010) UCI machine learning repository

  17. Freitas AA, Lavington SH (1998) Mining very large databases with parallel processing. In: Kluwer international series on advances in database systems, vol 8. Kluwer

  18. García S, Luengo J, Saez J, Lopez V, Herrera F (2013) A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans Knowl Data Eng 25(4):734–750

    Article  Google Scholar 

  19. Garland M, Le Grand S, Nickolls J, Anderson J, Hardwick J, Morton S, Phillips E, Zhang Y, Volkov V (2008) Parallel computing experiences with CUDA. IEEE Micro 28(4):13–27

    Article  Google Scholar 

  20. Gómez-Luna J, González-Linares J, Benavides J, Guil N (2012) Performance models for asynchronous data transfers on consumer graphics processing units. J Parallel Distrib Comput 72(9):1117–1126

    Article  Google Scholar 

  21. Green I, Robert C, Wang L, Alam M, Formato RA (2012) Central force optimization on a GPU: a case study in high performance metaheuristics. J Supercomput 62(1):378–398

    Article  Google Scholar 

  22. Hoare CAR (1962) Quicksort. Comput J 5(1):10–16

    Article  MATH  MathSciNet  Google Scholar 

  23. Hoberock J, Bell N (2011) Thrust: a productivity-oriented library for CUDA. In: Chapter 26, Morgan Kaufmann, pp 359–372

  24. Jian L, Wang C, Liu Y, Liang S, Yi W, Shi Y (2013) Parallel data mining techniques on graphics processing unit with compute unified device architecture (CUDA). J Supercomput 64(3):942–967

    Google Scholar 

  25. Khan FG, Khan OU, Montrucchio B, Giaccone P (2011) Analysis of fast parallel sorting algorithms for GPU architectures. In: Proceedings of the international conference on frontiers of information technology, pp 173–178

  26. Kirk DB, Hwu W-MW (2010) Programming massively parallel processors: a hands-on approach. Morgan Kaufmann

  27. Knuth DE (1998) The art of computer programming. In: Sorting and searching, vol 3. 2nd edn. Addison Wesley

  28. Kurgan LA, Cios KJ (2004) CAIM discretization algorithm. IEEE Trans Knowl Data Eng 16(2):145–153

    Article  Google Scholar 

  29. Li M, Deng S, Feng S, Fan J (2011) An effective discretization based on class-attribute coherence maximization. Pattern Recognit Lett 32(15):1962–1973

    Article  Google Scholar 

  30. Merrill D, Grimshaw A (2010) Revisiting sorting for GPGPU stream architectures. In: Proceedings of the international conference on parallel architectures and compilation techniques, pp 545–546

  31. Merrill D, Grimshaw A (2010) Revisiting sorting for GPGPU stream architectures. In: Technical report CS2010-03, University of Virginia, Department of Computer Science, Charlottesville

  32. Merrill D, Grimshaw A (2011) High performance and scalable radix sorting: a case study of implementing dynamic parallelism for GPU computing. Parallel Process Lett 21(2):245–272

    Article  MathSciNet  Google Scholar 

  33. Navarro CA, Hitschfeld-Kahler N, Mateu L (2014) A survey on parallel computing and its applications in data-parallel problems using GPU architectures. Commun Comput Phys 15(2):285–329

    MathSciNet  Google Scholar 

  34. NVIDIA Corporation (2013) NVIDIA CUDA programming and best practices guide. http://www.nvidia.com/cuda

  35. Owens JD, Luebke D, Govindaraju N, Harris M, Krüger J, Lefohn AE, Purcell TJ (2007) A survey of general-purpose computation on graphics hardware. Comput Graph Forum 26(1):80–113

    Article  Google Scholar 

  36. Parthasarathy S, Ramakrishnan A (2002) Parallel incremental 2d-discretization on dynamic datasets. In: Proceedings of the international conference on parallel and distributed processing systems, pp 247–254

  37. Peters H, Schulz-Hildebrandt O, Luttenberger N (2011) Fast in-place, comparison-based sorting with CUDA: a study with bitonic sort. Concur Comput Pract Exp 23(7):681–693

    Article  Google Scholar 

  38. Rajaraman A, Ullman JD (2011) Mining of massive datasets. In: Cambridge University Press

  39. Satish N, Harris M, Garland M (2009) Designing efficient sorting algorithms for manycore GPUs. In: Proceedings of the IEEE international symposium on parallel & distributed processing, pp 1–10

  40. Schellmann M, Gorlatch S, Meilnder D, Ksters T, Schfers K, Wbbeling F, Burger M (2011) Parallel medical image reconstruction: from graphics processing units (GPU) to grids. J Supercomput 57(2):151–160

    Article  Google Scholar 

  41. Shams R, Sadeghi P, Kennedy R, Hartley R (2010) A survey of medical image registration on multicore and the GPU. IEEE Signal Process Mag 27(2):50–60

    Article  Google Scholar 

  42. Sintorn E, Assarsson U (2008) Fast parallel GPU-sorting using a hybrid algorithm. J Parallel Distrib Comput 68(10):1381–1388

    Article  MATH  Google Scholar 

  43. Sriwanna K, Puntumapon K, Waiyamai K (2012) An enhanced class-attribute interdependence maximization discretization algorithm. In: Proceedings of the 8th international conference on advanced data mining and applications, vol 7713. LNAI, pp 465–476

  44. Tatarchuk N, Shopf J, DeCoro C (2008) Advanced interactive medical visualization on the GPU. J Parallel Distrib Comput 68(10):1319–1328

    Article  Google Scholar 

  45. Upadhyaya SR (2013) Parallel approaches to machine learning-a comprehensive survey. J Parallel Distrib Comput 73(3):284–292

    Article  Google Scholar 

  46. Wittek P, Darnyi S (2013) Accelerating text mining workloads in a mapreduce-based distributed GPU environment. J Parallel Distrib Comput 73(2):198–206

    Article  Google Scholar 

  47. Yang Y, Webb GI, Wu X (2010) Discretization methods. In: Data mining and knowledge discovery handbook, pp 101–116

  48. Yulong X, Xiaopeng W, Dawei X (2012) A two step parallel discretization algorithm based on dynamic clustering. In: Proceedings of the international conference on computer science and electronics engineering, vol 3. pp 192–196

  49. Zaki MJ, Ho CT (2000) Large-scale parallel data mining. In: State of the art survey. Lecture notes in artificial intelligence, Springer

  50. Zhang Y, Mueller F, Cui X, Potok T (2011) Data-intensive document clustering on graphics processing unit (GPU) clusters. J Parallel Distrib Comput 71(2):211–224

    Article  Google Scholar 

  51. Zhao Y, Niu Z, Peng X, Dai L (2011) A discretization algorithm of numerical attributes for digital library evaluation based on data mining technology. In: Digital libraries: for cultural heritage, knowledge dissemination, and future creation, vol 7008. LNCS, pp 70–76

Download references

Acknowledgments

This work has been supported by the Grant from the National Institutes of Health 1R01HD056235-01A1 (KJC), the Regional Government of Andalusia and the Ministry of Science and Technology project TIN-2011-22408 (SV), and the Ministry of Education FPU AP2010-0042 (AC). The authors also thank Duane Merrill and Sean Baxter from NVIDIA for their advise on improving efficiency of the sorting methods.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastián Ventura.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cano, A., Ventura, S. & Cios, K.J. Scalable CAIM discretization on multiple GPUs using concurrent kernels. J Supercomput 69, 273–292 (2014). https://doi.org/10.1007/s11227-014-1151-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1151-8

Keywords

Navigation