Advertisement

A Voronoi Diagram Approach to Autonomous Clustering

  • Heidi Koivistoinen
  • Minna Ruuska
  • Tapio Elomaa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4265)

Abstract

Clustering is a basic tool in unsupervised machine learning and data mining. Distance-based clustering algorithms rarely have the means to autonomously come up with the correct number of clusters from the data. A recent approach to identifying the natural clusters is to compare the point densities in different parts of the sample space.

In this paper we put forward an agglomerative clustering algorithm which accesses density information by constructing a Voronoi diagram for the input sample. The volumes of the point cells directly reflect the point density in the respective parts of the instance space. Scanning through the input points and their Voronoi cells once, we combine the densest parts of the instance space into clusters.

Our empirical experiments demonstrate the proposed algorithm is able to come up with a high-accuracy clustering for many different types of data. The Voronoi approach clearly outperforms k-means algorithm on data conforming to its underlying assumptions.

Keywords

Cluster Algorithm Cluster Center Voronoi Diagram Delaunay Triangulation Voronoi Cell 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31, 264–323 (1999)CrossRefGoogle Scholar
  2. 2.
    MacQueen, J.B.: On convergence of k-means and partitions with minimum average variance (abstract). Annals of Mathematical Statistics 36, 1084 (1965)CrossRefMathSciNetGoogle Scholar
  3. 3.
    Forgy, E.: Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications. Biometrics 21, 768 (1965)Google Scholar
  4. 4.
    Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. John Wiley & Sons, New York (1973)MATHGoogle Scholar
  5. 5.
    Pelleg, D., Moore, A.: X-means: Extending k-means with efficient estimation of the number of clusters. In: Langley, P. (ed.) Proc. 17th International Conference on Machine Learning, San Francisco, CA, pp. 727–734. Morgan Kaufmann, San Francisco (2000)Google Scholar
  6. 6.
    Hamerly, G., Elkan, C.: Learning the k in k-means. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems, vol. 16, pp. 281–288. MIT Press, Cambridge (2004)Google Scholar
  7. 7.
    Ng, R.T., Han, J.: Efficient and effective clustering methods for spatial data mining. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proc. 20th International Conference on Very Large Data Bases, San Francisco, CA, pp. 144–155. Morgan Kaufmann, San Francisco (1994)Google Scholar
  8. 8.
    Zhang, T., Ramakrishnan, R., Livny, M.: Birch: An efficient data clustering method for very large databases. In: Proc. ACM SIGMOD International Conference on Management of Data, pp. 103–114. ACM Press, New York (1995)Google Scholar
  9. 9.
    Guha, S., Rastogi, R., Shim, K.: Cure: An efficient clustering algorithm for large datasets. In: Proc. ACM SIGMOD International Conference on Management of Data, pp. 73–84. ACM Press, New York (1998)Google Scholar
  10. 10.
    Moore, A.W.: The anchors hierarchy: Using the triangle inequality to survive high dimensional data. In: Boutilier, C., Goldszmidt, M. (eds.) Proc. 16th Conference on Uncertainty in Artificial Intelligence, San Francisco, CA, pp. 397–405. Morgan Kaufmann, San Francisco (2000)Google Scholar
  11. 11.
    Elkan, C.: Using the triangle inequality to accelerate k-means. In: Fawcett, T., Mishra, N. (eds.) Proc. 20th International Conference on Machine Learning, pp. 147–153. AAAI Press, Menlo Park (2003)Google Scholar
  12. 12.
    Elomaa, T., Koivistoinen, H.: On autonomous k-means clustering. In: Hacid, M.-S., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds.) ISMIS 2005. LNCS (LNAI), vol. 3488, pp. 228–236. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  13. 13.
    Aurenhammer, F., Klein, R.: Voronoi diagrams. In: Sack, J., Urrutia, G. (eds.) Handbook of Computational Geometry, pp. 201–290. North-Holland, Amsterdam (2000)CrossRefGoogle Scholar
  14. 14.
    Schreiber, T.: A Voronoi diagram based adaptive k-means-type clustering algorithm for multidimensional weighted data. In: Bieri, H., Noltemeier, H. (eds.) CG-WS 1991. LNCS, vol. 553, pp. 265–275. Springer, Heidelberg (1991)Google Scholar
  15. 15.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press, Menlo Park (1996)Google Scholar
  16. 16.
    Xu, X., Ester, M., Kriegel, H.P., Sander, J.: A distribution-based clustering algorithm for mining in large spatial databases. In: Proc. 14th International Conference on Data Engineering, pp. 324–331. IEEE Computer Society Press, Los Alamitos (1998)Google Scholar
  17. 17.
    Hinneburg, A., Keim, D.A.: A general approach to clustering in large databases with noise. Knowledge and Information Systems 5, 387–415 (2003)CrossRefGoogle Scholar
  18. 18.
    Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1 + ε)-approximation algorithm for k-means clustering in any dimensions. In: Proc. 45th Annual IEEE Symposium on Foundations on Computer Science, pp. 454–462. IEEE Press, Los Alamitos (2004)CrossRefGoogle Scholar
  19. 19.
    Barber, C.B., Dobkin, D.P., Huhdanpaa, H.T.: The Quickhull algorithm for convex hulls. ACM Transactions on Mathematical Software 22, 469–483 (1996)MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Aggarwal, A., Guibas, L.J., Saxe, J.B., Shor, P.W.: A linear-time algorithm for computing the voronoi diagram of a convex polygon. Discrete & Computational Geometry 4, 591–604 (1989)MATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Heidi Koivistoinen
    • 1
  • Minna Ruuska
    • 1
  • Tapio Elomaa
    • 1
  1. 1.Institute of Software SystemsTampere University of TechnologyTampereFinland

Personalised recommendations