Advertisement

Parallel edge-based visual assessment of cluster tendency on GPU

  • Tao Meng
  • Bo Yuan
Regular Paper

Abstract

The visual assessment of (cluster) tendency (VAT) algorithm is an effective tool for investigating cluster tendency, which produces an intuitive image of matrix as the representation of complex datasets. The improved VAT (iVAT) incorporates a path-based distance metric into VAT to improve its effectiveness on complex-shaped datasets. The efficient formulation of the iVAT algorithm (efiVAT) further reduces the computational complexity of iVAT from \(O(N^3)\) to \(O(N^2)\). In this paper, we propose eVAT, an edge-based algorithm that can replicate the output of efiVAT but is more efficient and more suitable for parallelism. We also propose a parallel scheme to accelerate eVAT using NVIDIA GPU and CUDA architecture. We show that, on a range of datasets, the GPU-based eVAT features good scalability and can achieve significant speedups.

Keywords

Cluster analysis Cluster tendency VAT EfiVAT GPU 

Notes

Acknowledgements

This work was partially supported by the NVIDIA GPU Education Center awarded to Tsinghua University.

References

  1. 1.
    Bezdek, J.C., Hathaway, R.J.: VAT: a tool for visual assessment of (cluster) tendency. In: International Joint Conference on Neural Networks, pp. 2225–2230 (2002)Google Scholar
  2. 2.
    Bezdek, J.C., Hathaway, R.J., Huband, J.M.: Visual assessment of clustering tendency for rectangular dissimilarity matrices. IEEE Trans. Fuzzy Syst. 15(5), 890–903 (2007)CrossRefGoogle Scholar
  3. 3.
    Cook, S.: CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs. Elsevier, Hoboken (2012)Google Scholar
  4. 4.
    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)zbMATHGoogle Scholar
  5. 5.
    CUDA Toolkit Documentation. http://docs.nvidia.com/cuda/index.html
  6. 6.
    Fang, W., Lu, M., Xiao, X., He, B., Luo, Q.: Frequent itemset mining on graphics processors. In: International Workshop on Data Management on New Hardware, Damon 2009, pp. 34–42. Providence June (2009)Google Scholar
  7. 7.
    Farber, R.: CUDA Application Design and Development. Morgan Kaufmann Publishers Inc., Burlington (2011)Google Scholar
  8. 8.
    Govindaraju, N., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: high performance graphics co-processor sorting for large database management. In: ACM SIGMOD International Conference on Management of Data, pp. 325–336. Chicago June (2006)Google Scholar
  9. 9.
    Govindaraju, N.K., Lloyd, B., Wang, W., Lin, M., Manocha, D.: Fast computation of database operations using graphics processors. In: ACM SIGMOD International Conference on Management of Data, pp. 215–226. Paris June (2004)Google Scholar
  10. 10.
    Govindaraju, N.K., Raghuvanshi, N., Manocha, D.: Fast and approximate stream mining of quantiles and frequencies using graphics processors. In: ACM SIGMOD International Conference on Management of Data, pp. 611–622 (2005)Google Scholar
  11. 11.
    Hathaway, R.J., Bezdek, J.C., Huband, J.M.: Scalable visual assessment of cluster tendency for large data sets. Pattern Recognit. 39(7), 1315–1324 (2006)CrossRefGoogle Scholar
  12. 12.
    Havens, T.C., Bezdek, J.C.: An efficient f of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans. Knowl. Data Eng. 24(5), 813–822 (2012)CrossRefGoogle Scholar
  13. 13.
    Havens, T.C., Bezdek, J.C., Keller, J.M., Popescu, M.: Clustering in ordered dissimilarity data. Int. J. Intell. Syst. 24(5), 504–528 (2009)CrossRefGoogle Scholar
  14. 14.
    He, B., Govindaraju, N.K., Luo, Q., Smith, B.: Efficient gather and scatter operations on graphics processors. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, p. 46 (2007)Google Scholar
  15. 15.
    He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., Sander, P.: Relational joins on graphics processors. In: ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 511–524, Vancouver June (2008)Google Scholar
  16. 16.
    Huband, J.M., Bezdek, J.C., Hathaway, R.J.: Revised visual assessment of (cluster) tendency (reVAT). In: International Conference of the North American Fuzzy Information Processing Society, vol. 1, pp. 101–104 (2004)Google Scholar
  17. 17.
    Huband, J.M., Bezdek, J.C., Hathaway, R.J.: bigVAT: visual assessment of cluster tendency for large data sets. Pattern Recognit. 38(11), 1875–1886 (2005)CrossRefGoogle Scholar
  18. 18.
    Larsen, E.S., Mcallister, D.: Fast matrix multiplies using graphics hardware. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, pp. 43–43 (2001)Google Scholar
  19. 19.
    Lichman, M. UCI Machine Learning Repository. Irvine, University of California, Irvine, School of Information and Computer Sciences. (2013). http://archive.ics.uci.edu/ml
  20. 20.
    Meng, T., Yuan, B.: Parallel visual assessment of cluster tendency on GPU. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 429–440. Springer (2017)Google Scholar
  21. 21.
  22. 22.
    Pakhira, M.K.: Finding number of clusters before finding clusters. Procedia Technol. 4(11), 27–37 (2012)CrossRefGoogle Scholar
  23. 23.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(10), 2825–2830 (2013)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36(6), 1389–1401 (2014)CrossRefGoogle Scholar
  25. 25.
    Sart, D., Mueen, A., Najjar, W., Keogh, E., Niennattrakul, V.: Accelerating dynamic time warping subsequence search with GPUs and FPGAs. In: ICDM 2010, The IEEE International Conference on Data Mining, , pp. 14–17. Sydney December (2010)Google Scholar
  26. 26.
    Sledge, I.J., Huband, J.M., Bezdek, J.C.: (Automatic) Cluster count extraction from unlabeled data sets. In: 5th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 3–13 (2008)Google Scholar
  27. 27.
    Wang, L., Geng, X., Bezdek, J., Leckie, C., Kotagiri, R.: SpecVAT: enhanced visual cluster analysis. In: IEEE International Conference on Data Mining, pp. 638–647 (2008)Google Scholar
  28. 28.
    Wang, L., Leckie, C., Ramamohanarao, K., Bezdek, J.: Automatically determining the number of clusters in unlabeled data sets. IEEE Trans. Knowl. Data Eng. 21(3), 335–350 (2009)CrossRefGoogle Scholar
  29. 29.
    Wang, L., Nguyen, U.T., Bezdek, J.C., Leckie, C.A., Ramamohanarao, K.: iVAT and aVAT: enhanced visual analysis for cluster tendency assessment. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 16–27. Springer (2010)Google Scholar
  30. 30.
    Wilt, N.: The CUDA Handbook: A Comprehensive Guide to GPU Programming. Pearson Education, London (2013)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Graduate School at ShenzhenTsinghua UniversityShenzhenPeople’s Republic of China

Personalised recommendations