Patch Relational Neural Gas – Clustering of Huge Dissimilarity Datasets

  • Alexander Hasenfuss
  • Barbara Hammer
  • Fabrice Rossi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5064)

Abstract

Clustering constitutes an ubiquitous problem when dealing with huge data sets for data compression, visualization, or preprocessing. Prototype-based neural methods such as neural gas or the self-organizing map offer an intuitive and fast variant which represents data by means of typical representatives, thereby running in linear time. Recently, an extension of these methods towards relational clustering has been proposed which can handle general non-vectorial data characterized by dissimilarities only, such as alignment or general kernels. This extension, relational neural gas, is directly applicable in important domains such as bioinformatics or text clustering. However, it is quadratic in m both in memory and in time (m being the number of data points). Hence, it is infeasible for huge data sets. In this contribution we introduce an approximate patch version of relational neural gas which relies on the same cost function but it dramatically reduces time and memory requirements. It offers a single pass clustering algorithm for huge data sets, running in constant space and linear time only.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alex, N., Hammer, B., Klawonn, F.: Single pass clustering for large data sets. In: WSOM (2007)Google Scholar
  2. 2.
    Alex, N., Hammer, B.: Parallelizing single patch pass clustering (submitted, ESANN 2008)Google Scholar
  3. 3.
    Cottrell, M., Hammer, B., Hasenfuss, A., Villmann, T.: Batch and median neural gas. Neural Networks 19, 762–771 (2006)MATHCrossRefGoogle Scholar
  4. 4.
    Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering Data Streams. In: IEEE Symposium on Foundations of Computer Science, pp. 359–366 (2000)Google Scholar
  5. 5.
    Guha, S., Rastogi, R., Shim, K.: CURE: an efficient clustering algorithm for large datasets. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 73–84 (1998)Google Scholar
  6. 6.
    Haasdonk, B., Bahlmann, C.: Learning with distance substitution kernels. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 220–227. Springer, Heidelberg (2004)Google Scholar
  7. 7.
    Hammer, B., Hasenfuss, A.: Relational Neural Gas. In: Hertzberg, J., Beetz, M., Englert, R. (eds.) KI 2007. LNCS (LNAI), vol. 4667, pp. 190–204. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  8. 8.
    Hartigan, J.A.: Clustering Algorithms. Wiley, Chichester (1975)MATHGoogle Scholar
  9. 9.
    Jin, R., Goswami, A., Agrawal, G.: Fast and Exact Out-of-Core and Distributed K-Means Clustering. Knowledge and Information System (to appear)Google Scholar
  10. 10.
    Kohonen, T.: Self-Organized formation of topologically correct feature maps. Biological Cybernetics 43, 59–69 (1982)MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Kohonen, T., Somervuo, P.: How to make large self-organizing maps for nonvectorial data. Neural Networks 15, 945–952 (2002)CrossRefGoogle Scholar
  12. 12.
    Martinetz, T., Berkovich, S., Schulten, K.: ‘Neural gas’ network for vector quantization and its application to time series prediction. IEEE Transactions on Neural Networks 4(4), 558–569 (1993)CrossRefGoogle Scholar
  13. 13.
    Mevissen, H., Vingron, M.: Quantifying the local reliability of a sequence alignment. Protein Engineering 9, 127–132 (1996)CrossRefGoogle Scholar
  14. 14.
    Neuhaus, M., Bunke, H.: Edit distance based kernel functions for structural pattern classification. Pattern Recognition 39(10), 1852–1863 (2006)MATHCrossRefGoogle Scholar
  15. 15.
    Prudent, Y., Ennaji, A.: An incremental growing neural gas learns topology. In: IJCNN 2005 (2005)Google Scholar
  16. 16.
    Wang, W., Yang, J., Muntz, R.R.: STING: a statistical information grid approach to spatial data mining. In: Proceedings of the 23rd VLDB Conference, pp. 186–195 (1997)Google Scholar
  17. 17.
    Wolberg, W.H., Street, W.N., Heisey, D.M., Mangasarian, O.L.: Computer-derived nuclear features distinguish malignant from benign breast cytology. Human Pathology 26, 792–796 (1995)CrossRefGoogle Scholar
  18. 18.
    Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 15th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 103–114 (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Alexander Hasenfuss
    • 1
  • Barbara Hammer
    • 1
  • Fabrice Rossi
    • 2
  1. 1.Department of InformaticsClausthal University of TechnologyClausthal-ZellerfeldGermany
  2. 2.Projet AxIS, INRIA Rocquencourt, Domaine de Voluceau, RocquencourtLe Chesnay CedexFrance

Personalised recommendations