Abstract
This paper discusses the clustering quality and complexities of the hierarchical data clustering algorithm based on gravity theory. The gravitybased clustering algorithm simulates how the given N nodes in a K-dimensional continuous vector space will cluster due to the gravity force, provided that each node is associated with a mass. One of the main issues studied in this paper is how the order of the distance term in the denominator of the gravity force formula impacts clustering quality. The study reveals that, among the hierarchical clustering algorithms invoked for comparison, only the gravity-based algorithm with a high order of the distance term neither has a bias towards spherical clusters nor suffers the well-known chaining effect. Since bias towards spherical clusters and the chaining effect are two major problems with respect to clustering quality, eliminating both implies that high clustering quality is achieved. As far as time complexity and space complexity are concerned, the gravitybased algorithm enjoys either lower time complexity or lower space complexity, when compared with the most well-known hierarchical data clustering algorithms except single-link.
Chapter PDF
Similar content being viewed by others
References
Choudry, S. and N. Murty, A divisive scheme for constructing minimal spanning trees in coordinate space, Pattern Recognition Letters, volume 11 (1990), number 6, pp. 385–389
D. Eppstein, Fast hierarchical clustering and other applications of dynamic closest pairs, The ACM Journal of Experimental Algorithmics, 5(1):1–23, Jun 2000
M. Ester, H.-P. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), Aug. 1996.
B. Everitt, Cluster analysis, Halsted Press, 1980.
S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm for large databases. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data(SIGMOD’98), pages 73–84, Seattle, WA, June 1998.
S. Guha, R. Rastogi, and S. Kyuseok. ROCK: A robust clustering algorithm for categorical attributes. In Proceedings of ICDE’99, pp. 512–521, 1999.
J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2000
A. Hinneburg, and D. A. Keim, An Efficient Approach to Clustering in Large Multimedia Databases with Noise, Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, (KDD98), pp. 58–65, 1998.
A.K. Jain, R.C. Dubes, Algorithms for clustering data, Prentice Hall, 1988.
A.K. Jain, M.N. Murty, P.J. Flynn, Data Clustering: A Review, ACM Computing Surveys, Vol. 31, No. 3, pp.264–323, Sep. 1999.
G. Karypis, E.-H. Han, and V. Kumar. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. COMPUTER, 32:68–75, 1999
D. Krznaric and C. Levcopoulos, Fast Algorithms for Complete Linkage Clustering, Discrete & Computational Geometry, 19:131–145, 1998.
Kurita, T., An efficient agglomerative clustering algorithm using a heap, Pattern Recognition, volume 24 (1991), number 3 pp. 205–209
R.T. Ng, J. Han, Efficient and Effective Clustering Methods for Spatial Data Mining, VLDB’94, Proceedings of 20th International Conference on Very Large Data Bases, pp.144–155, Sep. 1994.
M. Stonebraker, J. Frew, K. Gardels and J. Meredith, The Sequoia 2000 Storage Benchmark, Proceedings of SIGMOD, pp. 2–11, 1993.
W.E. Wright, Gravitational Clustering, Pattern Recognition, 1977, Vol.9, pp. 151–166.
X. Xu, M. Ester, H.-P. Kriegel, J. Sander, A distribution-based clustering algorithm for mining in large spatial databases, In Proceedings of 14th International Conference on Data Engineering (ICDE’98), 1998.
Zamir, O. and O. Etzioni (1998). Web document clustering: A feasibility demonstration. In Proceedings of the 21th International ACM SIGIR Conference, pp. 46–54.
T. Zhang, R. Ramakrishnan, M. Livny, BIRCH: An Efficient Data Clustering Method for Very Large Databases, Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp.103–114, Jun. 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Oyang, YJ., Chen, CY., Yang, TW. (2001). A Study on the Hierarchical Data Clustering Algorithm Based on Gravity Theory. In: De Raedt, L., Siebes, A. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2001. Lecture Notes in Computer Science(), vol 2168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44794-6_29
Download citation
DOI: https://doi.org/10.1007/3-540-44794-6_29
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42534-2
Online ISBN: 978-3-540-44794-8
eBook Packages: Springer Book Archive