Scalable CAIM discretization on multiple GPUs using concurrent kernels

Cano, Alberto; Ventura, Sebastián; Cios, Krzysztof J.

doi:10.1007/s11227-014-1151-8

Scalable CAIM discretization on multiple GPUs using concurrent kernels

Published: 16 March 2014

Volume 69, pages 273–292, (2014)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Alberto Cano¹,
Sebastián Ventura^1,2 &
Krzysztof J. Cios^3,4

339 Accesses
7 Citations
Explore all metrics

Abstract

Class-attribute interdependence maximization (CAIM) is one of the state-of-the-art algorithms for discretizing data for which classes are known. However, it may take a long time when run on high-dimensional large-scale data, with large number of attributes and/or instances. This paper presents a solution to this problem by introducing a graphic processing unit (GPU)-based implementation of the CAIM algorithm that significantly speeds up the discretization process on big complex data sets. The GPU-based implementation is scalable to multiple GPU devices and enables the use of concurrent kernels execution capabilities of modern GPUs. The CAIM GPU-based model is evaluated and compared with the original CAIM using single and multi-threaded parallel configurations on 40 data sets with different characteristics. The results show great speedup, up to 139 times faster using four GPUs, which makes discretization of big data efficient and manageable. For example, discretization time of one big data set is reduced from 2 h to \(<\)2 min.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

SkelCL: Enhancing OpenCL for High-Level Programming of Multi-GPU Systems

Speeding up multiple instance learning classification rules on GPUs

Article 14 May 2014

Notes

The data sets description, the algorithm’s source code, the experimental settings and results are fully described and publicly available to facilitate the replicability of the experiments and future comparisons at the website: http://www.uco.es/grupos/kdis/kdiswiki/CAIM-GPU.

References

Akl SG (1990) Parallel sorting algorithms., Notes and reports in computer science and applied mathematicsAcademic Press, Orlando
Google Scholar
Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Analysis framework. J Mult Valued Logic Soft Comput 17:255–287
Google Scholar
Angiulli F, Pizzuti C (2005) Outlier mining in large high-dimensional data sets. IEEE Trans Knowl Data Eng 17(2):203–215
Article MathSciNet Google Scholar
Bentley JL, McIlroy MD (1993) Engineering a sort function. Softw Pract Exp 23(11):1249–1265
Article Google Scholar
Bernaschi M, Bisson M, Fatica M, Phillips E (2012) An introduction to multi-GPU programming for physicists. Eur Phys J Special Top 210:17–31
Article Google Scholar
Brodtkorb AR, Hagen TR, Stra ML (2013) Graphics processing unit (GPU) programming strategies and trends in GPU computing. J Parallel Distrib Comput 73(1):4–13
Article Google Scholar
Cannataro M, Talia D, Srimani P (2002) Parallel data intensive computing in scientific and commercial applications. Parallel Comput 28(5):673–704
Article Google Scholar
Cano A, Luna JM, Ventura S (2013) High performance evaluation of evolutionary-mined association rules on GPUs. J Supercomput 66(3):1438–1461
Article Google Scholar
Cano A, Zafra A, Ventura S (2012) Speeding up the evaluation phase of GP classification algorithms on GPUs. Soft Comput 16(2):187–202
Article Google Scholar
Cederman D, Tsigas P (2010) GPU-quicksort: a practical quicksort algorithm for graphics processors. J Exp Algorithm 14:4–24
Google Scholar
Cerquides J, Mantaras RLD (1997) Proposal and empirical comparison of a parallelizable distance-based discretization method. In: Proceedings of the international conference on knowledge discovery and data mining, pp 139–142
Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Skadron K (2008) A performance study of general-purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380
Article Google Scholar
Cios KJ, Pedrycz W, Swiniarski RW, Kurgan LA (2007) Data mining: a knowledge discovery approach. Springer
Cormen TH, Stein C, Rivest RL, Leiserson CE (2001) Introduction to Algorithms. In: 2nd edn. McGraw-Hill
Davidson A, Tarjan D, Garland M, Owens JD (2012) Efficient parallel merge sort for fixed and variable length keys. In: Proceedings of international conference on innovative parallel computing, pp 1–9
Frank A, Asuncion A (2010) UCI machine learning repository
Freitas AA, Lavington SH (1998) Mining very large databases with parallel processing. In: Kluwer international series on advances in database systems, vol 8. Kluwer
García S, Luengo J, Saez J, Lopez V, Herrera F (2013) A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans Knowl Data Eng 25(4):734–750
Article Google Scholar
Garland M, Le Grand S, Nickolls J, Anderson J, Hardwick J, Morton S, Phillips E, Zhang Y, Volkov V (2008) Parallel computing experiences with CUDA. IEEE Micro 28(4):13–27
Article Google Scholar
Gómez-Luna J, González-Linares J, Benavides J, Guil N (2012) Performance models for asynchronous data transfers on consumer graphics processing units. J Parallel Distrib Comput 72(9):1117–1126
Article Google Scholar
Green I, Robert C, Wang L, Alam M, Formato RA (2012) Central force optimization on a GPU: a case study in high performance metaheuristics. J Supercomput 62(1):378–398
Article Google Scholar
Hoare CAR (1962) Quicksort. Comput J 5(1):10–16
Article MATH MathSciNet Google Scholar
Hoberock J, Bell N (2011) Thrust: a productivity-oriented library for CUDA. In: Chapter 26, Morgan Kaufmann, pp 359–372
Jian L, Wang C, Liu Y, Liang S, Yi W, Shi Y (2013) Parallel data mining techniques on graphics processing unit with compute unified device architecture (CUDA). J Supercomput 64(3):942–967
Google Scholar
Khan FG, Khan OU, Montrucchio B, Giaccone P (2011) Analysis of fast parallel sorting algorithms for GPU architectures. In: Proceedings of the international conference on frontiers of information technology, pp 173–178
Kirk DB, Hwu W-MW (2010) Programming massively parallel processors: a hands-on approach. Morgan Kaufmann
Knuth DE (1998) The art of computer programming. In: Sorting and searching, vol 3. 2nd edn. Addison Wesley
Kurgan LA, Cios KJ (2004) CAIM discretization algorithm. IEEE Trans Knowl Data Eng 16(2):145–153
Article Google Scholar
Li M, Deng S, Feng S, Fan J (2011) An effective discretization based on class-attribute coherence maximization. Pattern Recognit Lett 32(15):1962–1973
Article Google Scholar
Merrill D, Grimshaw A (2010) Revisiting sorting for GPGPU stream architectures. In: Proceedings of the international conference on parallel architectures and compilation techniques, pp 545–546
Merrill D, Grimshaw A (2010) Revisiting sorting for GPGPU stream architectures. In: Technical report CS2010-03, University of Virginia, Department of Computer Science, Charlottesville
Merrill D, Grimshaw A (2011) High performance and scalable radix sorting: a case study of implementing dynamic parallelism for GPU computing. Parallel Process Lett 21(2):245–272
Article MathSciNet Google Scholar
Navarro CA, Hitschfeld-Kahler N, Mateu L (2014) A survey on parallel computing and its applications in data-parallel problems using GPU architectures. Commun Comput Phys 15(2):285–329
MathSciNet Google Scholar
NVIDIA Corporation (2013) NVIDIA CUDA programming and best practices guide. http://www.nvidia.com/cuda
Owens JD, Luebke D, Govindaraju N, Harris M, Krüger J, Lefohn AE, Purcell TJ (2007) A survey of general-purpose computation on graphics hardware. Comput Graph Forum 26(1):80–113
Article Google Scholar
Parthasarathy S, Ramakrishnan A (2002) Parallel incremental 2d-discretization on dynamic datasets. In: Proceedings of the international conference on parallel and distributed processing systems, pp 247–254
Peters H, Schulz-Hildebrandt O, Luttenberger N (2011) Fast in-place, comparison-based sorting with CUDA: a study with bitonic sort. Concur Comput Pract Exp 23(7):681–693
Article Google Scholar
Rajaraman A, Ullman JD (2011) Mining of massive datasets. In: Cambridge University Press
Satish N, Harris M, Garland M (2009) Designing efficient sorting algorithms for manycore GPUs. In: Proceedings of the IEEE international symposium on parallel & distributed processing, pp 1–10
Schellmann M, Gorlatch S, Meilnder D, Ksters T, Schfers K, Wbbeling F, Burger M (2011) Parallel medical image reconstruction: from graphics processing units (GPU) to grids. J Supercomput 57(2):151–160
Article Google Scholar
Shams R, Sadeghi P, Kennedy R, Hartley R (2010) A survey of medical image registration on multicore and the GPU. IEEE Signal Process Mag 27(2):50–60
Article Google Scholar
Sintorn E, Assarsson U (2008) Fast parallel GPU-sorting using a hybrid algorithm. J Parallel Distrib Comput 68(10):1381–1388
Article MATH Google Scholar
Sriwanna K, Puntumapon K, Waiyamai K (2012) An enhanced class-attribute interdependence maximization discretization algorithm. In: Proceedings of the 8th international conference on advanced data mining and applications, vol 7713. LNAI, pp 465–476
Tatarchuk N, Shopf J, DeCoro C (2008) Advanced interactive medical visualization on the GPU. J Parallel Distrib Comput 68(10):1319–1328
Article Google Scholar
Upadhyaya SR (2013) Parallel approaches to machine learning-a comprehensive survey. J Parallel Distrib Comput 73(3):284–292
Article Google Scholar
Wittek P, Darnyi S (2013) Accelerating text mining workloads in a mapreduce-based distributed GPU environment. J Parallel Distrib Comput 73(2):198–206
Article Google Scholar
Yang Y, Webb GI, Wu X (2010) Discretization methods. In: Data mining and knowledge discovery handbook, pp 101–116
Yulong X, Xiaopeng W, Dawei X (2012) A two step parallel discretization algorithm based on dynamic clustering. In: Proceedings of the international conference on computer science and electronics engineering, vol 3. pp 192–196
Zaki MJ, Ho CT (2000) Large-scale parallel data mining. In: State of the art survey. Lecture notes in artificial intelligence, Springer
Zhang Y, Mueller F, Cui X, Potok T (2011) Data-intensive document clustering on graphics processing unit (GPU) clusters. J Parallel Distrib Comput 71(2):211–224
Article Google Scholar
Zhao Y, Niu Z, Peng X, Dai L (2011) A discretization algorithm of numerical attributes for digital library evaluation based on data mining technology. In: Digital libraries: for cultural heritage, knowledge dissemination, and future creation, vol 7008. LNCS, pp 70–76

Download references

Acknowledgments

This work has been supported by the Grant from the National Institutes of Health 1R01HD056235-01A1 (KJC), the Regional Government of Andalusia and the Ministry of Science and Technology project TIN-2011-22408 (SV), and the Ministry of Education FPU AP2010-0042 (AC). The authors also thank Duane Merrill and Sean Baxter from NVIDIA for their advise on improving efficiency of the sorting methods.

Author information

Authors and Affiliations

Department of Computer Science and Numerical Analysis, University of Cordoba, Córdoba, Spain
Alberto Cano & Sebastián Ventura
Information Systems Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jidda, 21589, Saudi Arabia
Sebastián Ventura
Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA
Krzysztof J. Cios
IITiS Polish Academy of Sciences, Gliwice, Poland
Krzysztof J. Cios

Authors

Alberto Cano
View author publications
You can also search for this author in PubMed Google Scholar
Sebastián Ventura
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof J. Cios
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastián Ventura.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cano, A., Ventura, S. & Cios, K.J. Scalable CAIM discretization on multiple GPUs using concurrent kernels. J Supercomput 69, 273–292 (2014). https://doi.org/10.1007/s11227-014-1151-8

Download citation

Published: 16 March 2014
Issue Date: July 2014
DOI: https://doi.org/10.1007/s11227-014-1151-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalable CAIM discretization on multiple GPUs using concurrent kernels

Abstract

Access this article

Similar content being viewed by others

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

SkelCL: Enhancing OpenCL for High-Level Programming of Multi-GPU Systems

Speeding up multiple instance learning classification rules on GPUs

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scalable CAIM discretization on multiple GPUs using concurrent kernels

Abstract

Access this article

Similar content being viewed by others

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

SkelCL: Enhancing OpenCL for High-Level Programming of Multi-GPU Systems

Speeding up multiple instance learning classification rules on GPUs

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation