The Journal of Supercomputing

, Volume 71, Issue 7, pp 2365–2380 | Cite as

A comparative study of the parallel wavelet-based clustering algorithm on three-dimensional dataset

Article

Abstract

Cluster analysis—as a technique for grouping a set of objects into similar clusters—is an integral part of data analysis and has received wide interest among data mining specialists. The parallel wavelet-based clustering algorithm using discrete wavelet transforms has been shown to extract the approximation component of the input data on which objects of the clusters are detected based on the object connectivity property. However, this algorithm suffers from inefficient I/O operations and performance degradation due to redundant data processing. We address these issues to improve the parallel algorithm’s efficiency and extend the algorithm further by investigating two merging techniques (both merge-table and priority-queue based approaches), and apply them on three-dimensional data. In this study, we compare two parallel WaveCluster algorithms and a parallel K-means algorithm to evaluate the implemented algorithms’ effectiveness.

Keywords

Parallel clustering Discrete wavelet transform Improved parallel WaveCluster algorithm 

References

  1. 1.
    Arneodo A, Bacry E, Graves PV, Muzy JF (1995) Characterizing long-range correlations in DNA sequences from wavelet analysis. Phys Rev Lett 74:3293–3296. doi:10.1103/PhysRevLett.74.3293 CrossRefGoogle Scholar
  2. 2.
    Cohen L (2000) The uncertainty principles of windowed wave functions. Opt Commun 179(16):221–229. doi:10.1016/S0030-4018(00)00454-5. http://www.sciencedirect.com/science/article/pii/S0030401800004545
  3. 3.
    Haar A (1910) Zur theorie der orthogonalen funktionensysteme. Mathematische Annalen 69(3):331–371. doi:10.1007/BF01456326 MATHMathSciNetCrossRefGoogle Scholar
  4. 4.
    Lewis AS, Knowles G (1992) Image compression using the 2-D wavelet transform. IEEE Trans Image Process 1(2):244–250. doi:10.1109/83.136601 CrossRefGoogle Scholar
  5. 5.
    Liu Y, Pisharath J, Liao WK, Memik G, Choudhary A, Dubey P (2004) Performance evaluation and characterization of scalable data mining algorithms. In: Proceedings of IASTED. http://users.eecs.northwestern.edu/wkliao/Kmeans/
  6. 6.
    Loughlin P, Cohen L (2004) The uncertainty principle: global, local, or both? IEEE Trans Signal Process 52(5):1218–1227. doi:10.1109/TSP.2004.826160 MathSciNetCrossRefGoogle Scholar
  7. 7.
    Sheikholeslami G, Chatterjee S, Zhang A (2000) Wavecluster: a wavelet-based clustering approach for spatial data in very large databases. VLDB J 8(3–4):289–304CrossRefGoogle Scholar
  8. 8.
    Shim I, Soraghan JJ, Siew W (2001) Detection of PD utilizing digital signal processing methods. Part 3: open-loop noise reduction. Electr Insul Mag IEEE 17(1):6–13. doi:10.1109/57.901611 CrossRefGoogle Scholar
  9. 9.
    Torrence C, Compo GP (1998) A practical guide to wavelet analysis. Bull Am Meteorol Soc 79:61–78CrossRefGoogle Scholar
  10. 10.
    Tufekci Z, Gowdy J (2000) Feature extraction using discrete wavelet transform for speech recognition. In: Proceedings of the IEEE on Southeastcon 2000, pp 116–123. doi:10.1109/SECON.2000.845444
  11. 11.
    Valens C (1999) A really friendly guide to wavelets. C. Valens@mindless.com 2004Google Scholar
  12. 12.
    Yildirim AA, Ozdogan C (2011) Parallel wavecluster: a linear scaling parallel clustering algorithm implementation with application to very large datasets. J Parallel Distrib Comput 71(7):955–962. doi:10.1016/j.jpdc.2011.03.007 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Computer ScienceUtah State UniversityLoganUSA

Personalised recommendations