Advertisement

Clustering Based on Density Estimation with Sparse Grids

  • Benjamin Peherstorfer
  • Dirk Pflüger
  • Hans-Joachim Bungartz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7526)

Abstract

We present a density-based clustering method. The clusters are determined by splitting a similarity graph of the data into connected components. The splitting is accomplished by removing vertices of the graph at which an estimated density function of the data evaluates to values below a threshold. The density function is approximated on a sparse grid in order to make the method feasible in higher-dimensional settings and scalable in the number of data points. With benchmark examples we show that our method is competitive with other modern clustering methods. Furthermore, we consider a real-world example where we cluster nodes of a finite element model of a Chevrolet pick-up truck with respect to the displacements of the nodes during a frontal crash.

Keywords

clustering density estimation sparse grids 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, pp. 1027–1035. SIAM, Philadelphia (2007)Google Scholar
  2. 2.
    Bengio, Y., Paiement, J., Vincent, P., Delalleau, O., Roux, N.L., Ouimet, M.: Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16. MIT Press, Cambridge (2004)Google Scholar
  3. 3.
    Bungartz, H.J., Griebel, M.: Sparse grids. Acta Numerica 13, 147–269 (2004)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011)Google Scholar
  5. 5.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U.M. (eds.) Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press (1996)Google Scholar
  6. 6.
    Franzelin, F.: Classification with Estimated Densities on Sparse Grids. Master’s thesis, Institut für Informatik, Technische Universität München (September 2011)Google Scholar
  7. 7.
    Garcke, J., Griebel, M., Thess, M.: Data mining with sparse grids. Computing 67(3), 225–253 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  8. 8.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer (2009)Google Scholar
  9. 9.
    Hegland, M., Hooker, G., Roberts, S.: Finite element thin plate splines in density estimation. ANZIAM Journal 42 (2009)Google Scholar
  10. 10.
    Hinneburg, A., Gabriel, H.-H.: DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation. In: Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 70–80. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  11. 11.
    Hubert, L., Arabie, P.: Comparing partitions. J. of Classification 2(1), 193–218 (1985)CrossRefGoogle Scholar
  12. 12.
    Kanungo, T., Mount, D., Netanyahu, N., Piatko, C., Silverman, R., Wu, A.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7), 881–892 (2002)CrossRefGoogle Scholar
  13. 13.
    von Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing 17, 395–416 (2007)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)Google Scholar
  15. 15.
    Peherstorfer, B., Pflüger, D., Bungartz, H.-J.: A Sparse-Grid-Based Out-of-Sample Extension for Dimensionality Reduction and Clustering with Laplacian Eigenmaps. In: Wang, D., Reynolds, M. (eds.) AI 2011. LNCS, vol. 7106, pp. 112–121. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  16. 16.
    Pflüger, D.: Spatially Adaptive Sparse Grids for High-Dimensional Problems. Verlag Dr. Hut, München (2010)Google Scholar
  17. 17.
    Pflüger, D., Peherstorfer, B., Bungartz, H.J.: Spatially adaptive sparse grids for high-dimensional data-driven problems. J. of Complexity 26(5), 508–522 (2010)zbMATHCrossRefGoogle Scholar
  18. 18.
    Xu, R., Wunsch II, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)CrossRefGoogle Scholar
  19. 19.
    Zupan, J., Novic, M., Li, X., Gasteiger, J.: Classification of multicomponent analytical data of olive oils using different neural networks. Analytica Chimica Acta 292(3), 219–234 (1994)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Benjamin Peherstorfer
    • 1
  • Dirk Pflüger
    • 2
  • Hans-Joachim Bungartz
    • 1
  1. 1.Department of InformaticsTechnische Universität MünchenGarchingGermany
  2. 2.SimTech/Simulation of Large Systems, IPVSUniversität StuttgartStuttgartGermany

Personalised recommendations