Advertisement

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

  • Jianqiang Dong
  • Fei Wang
  • Bo Yuan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8206)

Abstract

In this big data era, the capability of mining and analyzing large scale datasets is imperative. As data are becoming more abundant than ever before, data driven methods are playing a critical role in areas such as decision support and business intelligence. In this paper, we demonstrate how state-of-the-art GPUs and the Dynamic Parallelism feature of the latest CUDA platform can bring significant benefits to BIRCH, one of the most well-known clustering techniques for streaming data. Experiment results show that, on a number of benchmark problems, the GPU accelerated BIRCH can be made up to 154 times faster than the CPU version with good scalability and high accuracy. Our work suggests that massively parallel GPU computing is a promising and effective solution to the challenges of big data.

Keywords

GPU CUDA Dynamic Parallelism BIRCH Big Data Clustering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Zhang, T., Raghu, R., Miron, L.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. ACM SIGMOD Record 25(2), 103–114 (1996)CrossRefGoogle Scholar
  2. 2.
    Zhang, T., Raghu, R., Miron, L.: BIRCH: A New Data Clustering Algorithm and Its Applications. Data Mining and Knowledge Discovery 1(2), 141–182 (1997)CrossRefGoogle Scholar
  3. 3.
    Fang, W., Lau, K., Lu, M., et al.: Parallel Data Mining on Graphics Processors. Technical Report HKUST-CS08-07 (2008)Google Scholar
  4. 4.
    Bai, H., He, L., Ouyang, D., Li, Z., Li, H.: K-Means on Commodity GPUs with CUDA. In: 2009 WRI World Congress on Computer Science and Information Engineering, pp. 651–655 (2009)Google Scholar
  5. 5.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999)CrossRefGoogle Scholar
  6. 6.
    Mahdiraji, A.R.: Clustering Data Stream: A Survey of Algorithms. International Journal of Knowledge-Based and Intelligent Engineering Systems 13(2), 39–44 (2009)Google Scholar
  7. 7.
    Berkhin, P.: A Survey of Clustering Data Mining Techniques. In: Kogan, J., et al. (eds.) Grouping Multidimensional Data, pp. 25–71. Springer (2006)Google Scholar
  8. 8.
    Barbará, D.: Requirements for Clustering Data Streams. ACM SIGKDD Explorations Newsletter 3(2), 23–27 (2002)CrossRefGoogle Scholar
  9. 9.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.: A Framework for Clustering Evolving Data Streams. In: 29th International Conference on Very Large Data Bases, pp. 81–92 (2003)Google Scholar
  10. 10.
    O’Callaghan, L., Meyerson, A., Motwani, R., Mishra, N., Guha, S.: Streaming-Data Algorithms for High-Quality Clustering. In: 18th International Conference on Data Engineering, pp. 685–694 (2002)Google Scholar
  11. 11.
    Shalom, S.A., Dash, M.: Efficient Partitioning Based Hierarchical Agglomerative Clustering Using Graphics Accelerations with CUDA. International Journal of Artificial Intelligence & Applications 4(2), 13–33 (2013)CrossRefGoogle Scholar
  12. 12.
    Shalom, S.A., Dash, M., Tue, M., Wilson, N.: Hierarchical Agglomerative Clustering Using Graphics Processor with Compute Unified Device Architecture. In: 2009 International Conference on Signal Processing Systems, pp. 556–561 (2009)Google Scholar
  13. 13.
    Garg, A., Mangla, A., Gupta, N., Bhatnagar, V.: PBIRCH: A Scalable Parallel Clustering Algorithm for Incremental Data. In: 10th IEEE International Database Engineering and Applications Symposium, pp. 315–316 (2006)Google Scholar
  14. 14.
    Bagga, A., Toshniwal, D.: Parallelization of Hierarchical Text Clustering on Multi-core CUDA Architecture. International Journal of Computer Science and Electrical Engineering 1, 72–76 (2012)Google Scholar
  15. 15.
    Guha, S., Rastogi, R., Shim, K.: CURE: An Efficient Clustering Algorithm for Large Databases. In: 1998 ACM International Conference on Management of Data, pp. 73–84 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Jianqiang Dong
    • 1
  • Fei Wang
    • 1
  • Bo Yuan
    • 1
  1. 1.Intelligent Computing Lab, Division of Informatics, Graduate School at ShenzhenTsinghua UniversityShenzhenP.R. China

Personalised recommendations