The Journal of Supercomputing

, Volume 69, Issue 1, pp 273–292 | Cite as

Scalable CAIM discretization on multiple GPUs using concurrent kernels

  • Alberto Cano
  • Sebastián VenturaEmail author
  • Krzysztof J. Cios


Class-attribute interdependence maximization (CAIM) is one of the state-of-the-art algorithms for discretizing data for which classes are known. However, it may take a long time when run on high-dimensional large-scale data, with large number of attributes and/or instances. This paper presents a solution to this problem by introducing a graphic processing unit (GPU)-based implementation of the CAIM algorithm that significantly speeds up the discretization process on big complex data sets. The GPU-based implementation is scalable to multiple GPU devices and enables the use of concurrent kernels execution capabilities of modern GPUs. The CAIM GPU-based model is evaluated and compared with the original CAIM using single and multi-threaded parallel configurations on 40 data sets with different characteristics. The results show great speedup, up to 139 times faster using four GPUs, which makes discretization of big data efficient and manageable. For example, discretization time of one big data set is reduced from 2 h to \(<\)2 min.


Supervised discretization Parallel implementation of CAIM algorithm GPU CUDA 



This work has been supported by the Grant from the National Institutes of Health 1R01HD056235-01A1 (KJC), the Regional Government of Andalusia and the Ministry of Science and Technology project TIN-2011-22408 (SV), and the Ministry of Education FPU AP2010-0042 (AC). The authors also thank Duane Merrill and Sean Baxter from NVIDIA for their advise on improving efficiency of the sorting methods.


  1. 1.
    Akl SG (1990) Parallel sorting algorithms., Notes and reports in computer science and applied mathematicsAcademic Press, OrlandoGoogle Scholar
  2. 2.
    Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Analysis framework. J Mult Valued Logic Soft Comput 17:255–287Google Scholar
  3. 3.
    Angiulli F, Pizzuti C (2005) Outlier mining in large high-dimensional data sets. IEEE Trans Knowl Data Eng 17(2):203–215CrossRefMathSciNetGoogle Scholar
  4. 4.
    Bentley JL, McIlroy MD (1993) Engineering a sort function. Softw Pract Exp 23(11):1249–1265CrossRefGoogle Scholar
  5. 5.
    Bernaschi M, Bisson M, Fatica M, Phillips E (2012) An introduction to multi-GPU programming for physicists. Eur Phys J Special Top 210:17–31CrossRefGoogle Scholar
  6. 6.
    Brodtkorb AR, Hagen TR, Stra ML (2013) Graphics processing unit (GPU) programming strategies and trends in GPU computing. J Parallel Distrib Comput 73(1):4–13CrossRefGoogle Scholar
  7. 7.
    Cannataro M, Talia D, Srimani P (2002) Parallel data intensive computing in scientific and commercial applications. Parallel Comput 28(5):673–704CrossRefGoogle Scholar
  8. 8.
    Cano A, Luna JM, Ventura S (2013) High performance evaluation of evolutionary-mined association rules on GPUs. J Supercomput 66(3):1438–1461CrossRefGoogle Scholar
  9. 9.
    Cano A, Zafra A, Ventura S (2012) Speeding up the evaluation phase of GP classification algorithms on GPUs. Soft Comput 16(2):187–202CrossRefGoogle Scholar
  10. 10.
    Cederman D, Tsigas P (2010) GPU-quicksort: a practical quicksort algorithm for graphics processors. J Exp Algorithm 14:4–24Google Scholar
  11. 11.
    Cerquides J, Mantaras RLD (1997) Proposal and empirical comparison of a parallelizable distance-based discretization method. In: Proceedings of the international conference on knowledge discovery and data mining, pp 139–142Google Scholar
  12. 12.
    Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Skadron K (2008) A performance study of general-purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380CrossRefGoogle Scholar
  13. 13.
    Cios KJ, Pedrycz W, Swiniarski RW, Kurgan LA (2007) Data mining: a knowledge discovery approach. SpringerGoogle Scholar
  14. 14.
    Cormen TH, Stein C, Rivest RL, Leiserson CE (2001) Introduction to Algorithms. In: 2nd edn. McGraw-HillGoogle Scholar
  15. 15.
    Davidson A, Tarjan D, Garland M, Owens JD (2012) Efficient parallel merge sort for fixed and variable length keys. In: Proceedings of international conference on innovative parallel computing, pp 1–9Google Scholar
  16. 16.
    Frank A, Asuncion A (2010) UCI machine learning repositoryGoogle Scholar
  17. 17.
    Freitas AA, Lavington SH (1998) Mining very large databases with parallel processing. In: Kluwer international series on advances in database systems, vol 8. KluwerGoogle Scholar
  18. 18.
    García S, Luengo J, Saez J, Lopez V, Herrera F (2013) A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans Knowl Data Eng 25(4):734–750CrossRefGoogle Scholar
  19. 19.
    Garland M, Le Grand S, Nickolls J, Anderson J, Hardwick J, Morton S, Phillips E, Zhang Y, Volkov V (2008) Parallel computing experiences with CUDA. IEEE Micro 28(4):13–27CrossRefGoogle Scholar
  20. 20.
    Gómez-Luna J, González-Linares J, Benavides J, Guil N (2012) Performance models for asynchronous data transfers on consumer graphics processing units. J Parallel Distrib Comput 72(9):1117–1126CrossRefGoogle Scholar
  21. 21.
    Green I, Robert C, Wang L, Alam M, Formato RA (2012) Central force optimization on a GPU: a case study in high performance metaheuristics. J Supercomput 62(1):378–398CrossRefGoogle Scholar
  22. 22.
    Hoare CAR (1962) Quicksort. Comput J 5(1):10–16CrossRefzbMATHMathSciNetGoogle Scholar
  23. 23.
    Hoberock J, Bell N (2011) Thrust: a productivity-oriented library for CUDA. In: Chapter 26, Morgan Kaufmann, pp 359–372Google Scholar
  24. 24.
    Jian L, Wang C, Liu Y, Liang S, Yi W, Shi Y (2013) Parallel data mining techniques on graphics processing unit with compute unified device architecture (CUDA). J Supercomput 64(3):942–967Google Scholar
  25. 25.
    Khan FG, Khan OU, Montrucchio B, Giaccone P (2011) Analysis of fast parallel sorting algorithms for GPU architectures. In: Proceedings of the international conference on frontiers of information technology, pp 173–178Google Scholar
  26. 26.
    Kirk DB, Hwu W-MW (2010) Programming massively parallel processors: a hands-on approach. Morgan KaufmannGoogle Scholar
  27. 27.
    Knuth DE (1998) The art of computer programming. In: Sorting and searching, vol 3. 2nd edn. Addison WesleyGoogle Scholar
  28. 28.
    Kurgan LA, Cios KJ (2004) CAIM discretization algorithm. IEEE Trans Knowl Data Eng 16(2):145–153CrossRefGoogle Scholar
  29. 29.
    Li M, Deng S, Feng S, Fan J (2011) An effective discretization based on class-attribute coherence maximization. Pattern Recognit Lett 32(15):1962–1973CrossRefGoogle Scholar
  30. 30.
    Merrill D, Grimshaw A (2010) Revisiting sorting for GPGPU stream architectures. In: Proceedings of the international conference on parallel architectures and compilation techniques, pp 545–546Google Scholar
  31. 31.
    Merrill D, Grimshaw A (2010) Revisiting sorting for GPGPU stream architectures. In: Technical report CS2010-03, University of Virginia, Department of Computer Science, CharlottesvilleGoogle Scholar
  32. 32.
    Merrill D, Grimshaw A (2011) High performance and scalable radix sorting: a case study of implementing dynamic parallelism for GPU computing. Parallel Process Lett 21(2):245–272CrossRefMathSciNetGoogle Scholar
  33. 33.
    Navarro CA, Hitschfeld-Kahler N, Mateu L (2014) A survey on parallel computing and its applications in data-parallel problems using GPU architectures. Commun Comput Phys 15(2):285–329MathSciNetGoogle Scholar
  34. 34.
    NVIDIA Corporation (2013) NVIDIA CUDA programming and best practices guide.
  35. 35.
    Owens JD, Luebke D, Govindaraju N, Harris M, Krüger J, Lefohn AE, Purcell TJ (2007) A survey of general-purpose computation on graphics hardware. Comput Graph Forum 26(1):80–113CrossRefGoogle Scholar
  36. 36.
    Parthasarathy S, Ramakrishnan A (2002) Parallel incremental 2d-discretization on dynamic datasets. In: Proceedings of the international conference on parallel and distributed processing systems, pp 247–254Google Scholar
  37. 37.
    Peters H, Schulz-Hildebrandt O, Luttenberger N (2011) Fast in-place, comparison-based sorting with CUDA: a study with bitonic sort. Concur Comput Pract Exp 23(7):681–693CrossRefGoogle Scholar
  38. 38.
    Rajaraman A, Ullman JD (2011) Mining of massive datasets. In: Cambridge University PressGoogle Scholar
  39. 39.
    Satish N, Harris M, Garland M (2009) Designing efficient sorting algorithms for manycore GPUs. In: Proceedings of the IEEE international symposium on parallel & distributed processing, pp 1–10Google Scholar
  40. 40.
    Schellmann M, Gorlatch S, Meilnder D, Ksters T, Schfers K, Wbbeling F, Burger M (2011) Parallel medical image reconstruction: from graphics processing units (GPU) to grids. J Supercomput 57(2):151–160CrossRefGoogle Scholar
  41. 41.
    Shams R, Sadeghi P, Kennedy R, Hartley R (2010) A survey of medical image registration on multicore and the GPU. IEEE Signal Process Mag 27(2):50–60CrossRefGoogle Scholar
  42. 42.
    Sintorn E, Assarsson U (2008) Fast parallel GPU-sorting using a hybrid algorithm. J Parallel Distrib Comput 68(10):1381–1388CrossRefzbMATHGoogle Scholar
  43. 43.
    Sriwanna K, Puntumapon K, Waiyamai K (2012) An enhanced class-attribute interdependence maximization discretization algorithm. In: Proceedings of the 8th international conference on advanced data mining and applications, vol 7713. LNAI, pp 465–476Google Scholar
  44. 44.
    Tatarchuk N, Shopf J, DeCoro C (2008) Advanced interactive medical visualization on the GPU. J Parallel Distrib Comput 68(10):1319–1328CrossRefGoogle Scholar
  45. 45.
    Upadhyaya SR (2013) Parallel approaches to machine learning-a comprehensive survey. J Parallel Distrib Comput 73(3):284–292CrossRefGoogle Scholar
  46. 46.
    Wittek P, Darnyi S (2013) Accelerating text mining workloads in a mapreduce-based distributed GPU environment. J Parallel Distrib Comput 73(2):198–206CrossRefGoogle Scholar
  47. 47.
    Yang Y, Webb GI, Wu X (2010) Discretization methods. In: Data mining and knowledge discovery handbook, pp 101–116Google Scholar
  48. 48.
    Yulong X, Xiaopeng W, Dawei X (2012) A two step parallel discretization algorithm based on dynamic clustering. In: Proceedings of the international conference on computer science and electronics engineering, vol 3. pp 192–196Google Scholar
  49. 49.
    Zaki MJ, Ho CT (2000) Large-scale parallel data mining. In: State of the art survey. Lecture notes in artificial intelligence, SpringerGoogle Scholar
  50. 50.
    Zhang Y, Mueller F, Cui X, Potok T (2011) Data-intensive document clustering on graphics processing unit (GPU) clusters. J Parallel Distrib Comput 71(2):211–224CrossRefGoogle Scholar
  51. 51.
    Zhao Y, Niu Z, Peng X, Dai L (2011) A discretization algorithm of numerical attributes for digital library evaluation based on data mining technology. In: Digital libraries: for cultural heritage, knowledge dissemination, and future creation, vol 7008. LNCS, pp 70–76Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Alberto Cano
    • 1
  • Sebastián Ventura
    • 1
    • 2
    Email author
  • Krzysztof J. Cios
    • 3
    • 4
  1. 1.Department of Computer Science and Numerical AnalysisUniversity of CordobaCórdobaSpain
  2. 2.Information Systems Department, Faculty of Computing and Information TechnologyKing Abdulaziz UniversityJiddaSaudi Arabia
  3. 3.Department of Computer ScienceVirginia Commonwealth UniversityRichmondUSA
  4. 4.IITiS Polish Academy of SciencesGliwicePoland

Personalised recommendations