Skip to main content

K-Harmonic Means -A Spatial Clustering Algorithm with Boosting

  • Conference paper
  • First Online:
Temporal, Spatial, and Spatio-Temporal Data Mining (TSDM 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2007))

Abstract

We propose a new center-based iterative clustering algorithm, K- Harmonic Means (KHM), which is essentially insensitive to the initialization of the centers, demonstrated through a set of experiments. The dependency of the K-Means performance on the initialization of the centers has been a major problem; a similar issue exists for an alternative algorithm, Expectation Maximization (EM). Many have tried to generate good initializations to solve the sensitivity problem. KHM addresses the intrinsic problem by replacing the minimum distance from a data point to the centers, used in K-means, by the Harmonic Averages of the distances from the data point to all centers. KHM significantly improves the quality of clustering results comparing with both K- Means and EM. The KHM algorithm has been implemented in both sequential and parallel languages and tested on hundreds of randomly generated datasets with different data distribution and clustering characteristics.

Primary Contact: bzhang@hpl.hp.com. This document is released as a technical report in Oct. 1999, available at http://www.hpl.hp.com/techreports/1999/HPL-1999-124.html.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderberg, M. R. 1973. Cluster analysis for applications. Academic Press, New York. 35p.

    Google Scholar 

  2. Bay, S. D. (1999). The UCI KDD Archive [http://kdd.ics.uci.edu]. Irvine, CA: University of California, Department of Information and Computer Science.

    Google Scholar 

  3. Bezdek, Ehrlich, & Full, “FCM: THE FUZZY c-MEANS CLUSTERING ALGORITHM”, Computers & Geosciences, v.10, pp.191–203, 1984

    Article  Google Scholar 

  4. Bradley, P., Fayyad, U. M., and Reina, C.A., “Scaling EM Clustering to Large Databases,” MS Technical Report, 1998.

    Google Scholar 

  5. Bradley, P., Fayyad, U. M., C.A., “Refining Initial Points for KM Clustering”, MS Technical Report MSR-TR-98-36, May 1998.

    Google Scholar 

  6. Bradley, P., Fayyad, U.M., and Reina, C.A., “Scaling Clustering to Large Databases”, KDD98, 1998.

    Google Scholar 

  7. Duda, R., Hart, P., “Pattern Classification and Scene Analysis”, John Wiley & Sons, 1972.

    Google Scholar 

  8. Dempster, A. P., Laird, N.M., and Rubin, D.B., “Miximum Likelyhood from Incomplete Data via the EM Algorithm”, Journal of the Royal Statistical Society, Series B, 39(1):1–38, 1977.

    MATH  MathSciNet  Google Scholar 

  9. Fayyad, U. M., Piatetsky-Shapiro, G. Smyth, P. and Uthurusamy, R., “Advances in Knowledge Discovery and Data Mining”, AAAI Press 1996

    Google Scholar 

  10. Gersho & Gray, “Vector Quantization and Signal Compression”, KAP, 1992

    Google Scholar 

  11. Gill, P.E., Murray, W. and Wright, H.M., “Practical Optimization”, Academic Press, 1981.

    Google Scholar 

  12. Gonzales, T.F., “Clustering to Minimize the Maximum Intercluster Distance”, Theo. Comp. Sci. 38, p293–306, 1985.

    Article  Google Scholar 

  13. Kaufman, L. and Rousseeuw, P. J., “Finding Groups in Data: An Introduction to Cluster Analysis”, John Wiley & Sons, 1990.

    Google Scholar 

  14. MacQueen, J. 1967. Some methods for classification and analysis of multivariate observations. Pp. 281–297 in: L. M. Le Cam & J. Neyman [eds.] Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1. University of California Press, Berkeley. xvii + 666 p.

    Google Scholar 

  15. McKenzie, P. and Alder, M., “Initializing the EM Algorithm for Use in Gaussian Mixture Modeling”, The Univ. of Western Australia, Center for Information Processing Systems, Manuscript.

    Google Scholar 

  16. McLachlan, G. J. and Krishnan, T., “The EM Algorithm and Extensions.”, John Wiley & Sons, Inc., 1997

    Google Scholar 

  17. Pelleg, D. and Moore, A, “Accelerating Exact K-Means Algorithms with Geometric Reasoning”, KDD-99, Proc. of the Fifth ACM SIGKDD Intern. Conf. On Knowledge Discovery and Data Mining, page 277–281.

    Google Scholar 

  18. Rendner, R.A. and Walker, H.F., “Mixture Densities, Maximum Likelihood and The EM Algorithm”, SIAM Review, vol. 26 #2, 1984.

    Google Scholar 

  19. Selim, S.Z. and Ismail, M.A., “K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality”, IEEE Trans. On PAMI-6, #1, 1984.

    Google Scholar 

  20. Zhang, B., Hsu, M., Forman, G., “Accurate Recasting of Parameter Estimation Algorithms using Sufficient Statistics for Efficient Parallel Speed-up”, To appear in PKDD 2000, September, Lyon, France.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, B., Hsu, M., Dayal, U. (2001). K-Harmonic Means -A Spatial Clustering Algorithm with Boosting. In: Roddick, J.F., Hornsby, K. (eds) Temporal, Spatial, and Spatio-Temporal Data Mining. TSDM 2000. Lecture Notes in Computer Science(), vol 2007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45244-3_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-45244-3_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41773-6

  • Online ISBN: 978-3-540-45244-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics