Mapping Data Mining Algorithms on a GPU Architecture: A Study

  • Ana Gainaru
  • Emil Slusanschi
  • Stefan Trausan-Matu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6804)


Data mining algorithms are designed to extract information from a huge amount of data in an automatic way. The datasets that can be analysed with these techniques are gathered from a variety of domains, from business related fields to HPC and supercomputers. The datasets continue to increase at an exponential rate, so research has been focusing on parallelizing different data mining techniques. Recently, GPU hybrid architectures are starting to be used for this task. However the data transfer rate between CPU and GPU is a bottleneck for the applications dealing with large data entries exhibiting numerous dependencies. In this paper we analyse how efficient data mining algorithms can be mapped on these architectures by extracting the common characteristics of these methods and by looking at the communication patterns between the main memory and the GPU’s shared memory. We propose an experimental study for the performance of memory systems on GPU architectures when dealing with data mining algorithms and we also advance performance model guidelines based on the observations.


Execution Time Graphic Processing Unit Shared Memory Main Memory Data Mining Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: International Conference on Very Large Data Bases, pp. 487–499 (1994)Google Scholar
  2. 2.
    Han, J., et al.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery 8(1) (2004)Google Scholar
  3. 3.
    Fang, W., et al.: Wenbin Fang and all: Frequent Itemset Mining on Graphics Processors (2009)Google Scholar
  4. 4.
    Liu, L., et al.: Optimization of Frequent Itemset Mining on Multiple-Core Processor. In: International Conference on Very Large Data Bases, pp. 1275–1285 (2007)Google Scholar
  5. 5.
    Shalom, A., et al.: Efficient k-means clustering using accelerated graphics processors. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 166–175 (2008)Google Scholar
  6. 6.
    Cao, F., Tung, A.K.H., Zhou, A.: Scalable clustering using graphics processors. In: Yu, J.X., Kitsuregawa, M., Leong, H.-V. (eds.) WAIM 2006. LNCS, vol. 4016, pp. 372–384. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  7. 7.
    Liao, Q., et al.: Accelerated Support Vector Machines for Mining High-Throughput Screening Data. J. Chem. Inf. Model. 49(12), 2718–2725 (2009)CrossRefGoogle Scholar
  8. 8.
    Wu, X., et al.: Top 10 algorithms in data mining. Knowledge and Information Systems 14(1) (2007)Google Scholar
  9. 9.
    Lastra, A., Lin, M., Manocha, D.: Gpgp: General purpose computation using graphics processors. In: ACM Workshop on General Purpose Computing on Graphics Processors (2004)Google Scholar
  10. 10.
    Li, J., et al.: Parallel Data Mining Algorithms for Association Rules and Clustering. In: International Conference on Management of Data (2008)Google Scholar
  11. 11.
    Carpenter, A.: CuSVM A cuda implementation of support vector classification and regression (2009),
  12. 12.
    Pramudiono, I., et al.: Tree structure based parallel frequent pattern mining on PC cluster. In: International Conference on Database and Expert Systems Applications, pp. 537–547 (2003)Google Scholar
  13. 13.
    Pramudiono, I., Kitsuregawa, M.: Tree structure based parallel frequent pattern mining on PC cluster. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 537–547. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  14. 14.
    Garcia, V., et al.: Fast k nearest neighbor search using GPU. In: Computer Vision and Pattern Recognition Workshops (2008)Google Scholar
  15. 15.
    Oh, K.-S., et al.: GPU implementation of neural networks. Journal of Pattern Recognition 37(6) (2004)Google Scholar
  16. 16.
    Domeniconi, C., et al.: An Efficient Density-based Approach for Data Mining Tasks. Journal of Knowledge and Information Systems 6(6) (2004)Google Scholar
  17. 17.
    Domeniconi, C., et al.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In: Symposium on Principles and Practice of Parallel Programming, pp. 101–110 (2009)Google Scholar
  18. 18.
    Wang, Q.: Divergence estimation of continuous distributions based on data-dependent partitions. IEEE Transactions on Information Theory, 3064–3074 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Ana Gainaru
    • 1
    • 2
  • Emil Slusanschi
    • 1
  • Stefan Trausan-Matu
    • 1
  1. 1.University Politehnica of BucharestRomania
  2. 2.University of Illinois at Urbana-ChampaignUSA

Personalised recommendations