Advertisement

Parallel Visual Assessment of Cluster Tendency on GPU

  • Tao Meng
  • Bo Yuan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10235)

Abstract

Determining the number of clusters in a data set is a critical issue in cluster analysis. The Visual Assessment of (cluster) Tendency (VAT) algorithm is an effective tool for investigating cluster tendency, which produces an intuitive image of matrix as the representation of complex data sets. However, VAT can be computationally expensive for large data sets due to its \( O\left( {N^{2} } \right) \) time complexity. In this paper, we propose an efficient parallel scheme to accelerate the original VAT using NVIDIA GPU and CUDA architecture. We show that, on a range of data sets, the GPU-based VAT features good scalability and can achieve significant speedups compared to the original algorithm.

Keywords

Cluster analysis Cluster tendency VAT GPU 

Notes

Acknowledgment

This work was partially supported by the NVIDIA GPU Education Center awarded to Tsinghua University.

References

  1. 1.
    Wang, L., Geng, X., Bezdek, J., Leckie, C., Kotagiri, R.: SpecVAT: enhanced visual cluster analysis. In: International Conference on Data Mining, pp. 638–647 (2008)Google Scholar
  2. 2.
    Bezdek, J.C., Hathaway, R.J.: VAT: a tool for visual assessment of (cluster) tendency. In: International Joint Conference on Neural Networks, vol. 3, pp. 2225–2230 (2002)Google Scholar
  3. 3.
    Huband, J.M., Bezdek, J.C., Hathaway, R.J.: Revised visual assessment of (cluster) tendency (reVAT). In: International Conference of the North American Fuzzy Information Processing Society, pp. 101–104 (2004)Google Scholar
  4. 4.
    Huband, J., Bezdek, J.C., Hathaway, R.: bigVAT: visual assessment of cluster tendency for large data sets. Pattern Recogn. 38(11), 1875–1886 (2005)CrossRefGoogle Scholar
  5. 5.
    Hathaway, R., Bezdek, J.C., Huband, J.: Scalable visual assessment of cluster tendency. Pattern Recogn. 39(7), 1315–1324 (2006)CrossRefzbMATHGoogle Scholar
  6. 6.
    Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36(6), 1389–1401 (1957)CrossRefGoogle Scholar
  7. 7.
    Pakhira, M.K.: Finding number of clusters before finding clusters. Procedia Technol. 4, 27–37 (2012)CrossRefGoogle Scholar
  8. 8.
    Wang, L., Nguyen, U.T.V., Bezdek, J.C., Leckie, C.A., Ramamohanarao, K.: iVAT and aVAT: enhanced visual analysis for cluster tendency assessment. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS (LNAI), vol. 6118, pp. 16–27. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-13657-3_5 CrossRefGoogle Scholar
  9. 9.
    Havens, T.C., Bezdek, J.C.: An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans. Knowl. Data Eng. 24(5), 813–822 (2012)CrossRefGoogle Scholar
  10. 10.
    Havens, T.C., Bezdek, J.C., Keller, J.M., Popescu, M.: Clustering in ordered dissimilarity data. Int. J. Intell. Syst. 24(5), 504–528 (2009)CrossRefzbMATHGoogle Scholar
  11. 11.
    Bezdek, J.C., Hathaway, R., Huband, J.: Visual assessment of clustering tendency for rectangular dissimilarity matrices. IEEE Trans. Fuzzy Syst. 15(5), 890–903 (2007)CrossRefGoogle Scholar
  12. 12.
    Sledge, I., Huband, J., Bezdek, J.C.: (Automatic) Cluster count extraction from unlabeled datasets. In: Joint International Conference on Natural Computation and International Conference on Fuzzy Systems and Knowledge Discovery, vol. 1, pp. 3–13 (2008)Google Scholar
  13. 13.
    CUDA Toolkit Documentation. http://docs.nvidia.com/cuda/index.html
  14. 14.
    Cook, S.: CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs. Newnes, Oxford (2012)Google Scholar
  15. 15.
    Farber, R.: CUDA Application Design and Development. Elsevier, Amsterdam (2012)Google Scholar
  16. 16.
    Larsen, E.S., McAllister, D.: Fast matrix multiplies using graphics hardware. In: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing, no. 43 (2001)Google Scholar
  17. 17.
    Govindaraju, N.K., Lloyd, B., Wang, W., Lin, M., Manocha, D.: Fast computation of database operations using graphics processors. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 215–226 (2004)Google Scholar
  18. 18.
    Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: high performance graphics co-processor sorting for large database management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 325–336 (2006)Google Scholar
  19. 19.
    He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., Sander, P.: Relational joins on graphics processors. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 511–524 (2008)Google Scholar
  20. 20.
    Govindaraju, N.K., Raghuvanshi, N., Manocha, D.: Fast and approximate stream mining of quantiles and frequencies using graphics processors. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 611–622 (2005)Google Scholar
  21. 21.
    Fang, W., Lu, M., Xiao, X., He, B., Luo, Q.: Frequent itemset mining on graphics processors. In: Proceedings of the Fifth International Workshop on Data Management on New Hardware, pp. 34–42 (2009)Google Scholar
  22. 22.
    Sart, D., Mueen, A., Najjar, W., Keogh, E., Niennattrakul, V.: Accelerating dynamic time warping subsequence search with GPUs and FPGAs. In: 2010 IEEE International Conference on Data Mining, pp. 1001–1006 (2010)Google Scholar
  23. 23.
    He, B., Govindaraju, N.K., Luo, Q., Smith, B.: Efficient gather and scatter operations on graphics processors. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, no. 46 (2007)Google Scholar
  24. 24.
    Nicholas, W.: The CUDA Handbook: A Comprehensive Guide to GPU Programming. Addison-Wesley Professional, Boston (2013)Google Scholar
  25. 25.
    Pedregosa, et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  26. 26.
  27. 27.
    Bache, K., Lichman, M.: UCI Machine Learning Repository (2013)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Intelligent Computing Lab, Division of Informatics, Graduate School at ShenzhenTsinghua UniversityShenzhenPeople’s Republic of China

Personalised recommendations