Median Topographic Maps for Biomedical Data Sets

  • Barbara Hammer
  • Alexander Hasenfuss
  • Fabrice Rossi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5400)

Abstract

Median clustering extends popular neural data analysis methods such as the self-organizing map or neural gas to general data structures given by a dissimilarity matrix only. This offers flexible and robust global data inspection methods which are particularly suited for a variety of data as occurs in biomedical domains. In this chapter, we give an overview about median clustering and its properties and extensions, with a particular focus on efficient implementations adapted to large scale data analysis.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Al-Harbi, S., Rayward-Smith, V.: The use of a supervised k-means algorithm on real-valued data with applications in health. In: Chung, P.W.H., Hinde, C.J., Ali, M. (eds.) IEA/AIE 2003. LNCS, vol. 2718, pp. 575–581. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  2. 2.
    Alex, N., Hammer, B.: Parallelizing single pass patch clustering. In: Verleysen, M. (ed.) ESANN 2008, pp. 227–232 (2008)Google Scholar
  3. 3.
    Alex, N., Hammer, B., Klawonn, F.: Single pass clustering for large data sets. In: Proceedings of 6th International Workshop on Self-Organizing Maps (WSOM 2007), Bielefeld, Germany, September 3-6 (2007)Google Scholar
  4. 4.
    Ambroise, C., Govaert, G.: Analyzing dissimilarity matrices via Kohonen maps. In: Proceedings of 5th Conference of the International Federation of Classification Societies (IFCS 1996), Kobe (Japan), March 1996, vol. 2, pp. 96–99 (1996)Google Scholar
  5. 5.
    Anderson, E.: The irises of the gaspe peninsula. Bulletin of the American Iris Society 59, 25 (1935)Google Scholar
  6. 6.
    Arora, S., Raghavan, P., Rao, S.: Approximation schemes for euclidean k-medians and related problems. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing, pp. 106–113 (1998)Google Scholar
  7. 7.
    Barreto, G.A.: Time series prediction with the self-organizing map: A review. In: Hammer, B., Hitzler, P. (eds.) Perspectives on Neural-Symbolic Integration. Springer, Heidelberg (2007)Google Scholar
  8. 8.
    Boulet, R., Jouve, B., Rossi, F., Villa, N.: Batch kernel som and related laplacian methods for social network analysis. In: Neurocomputing (2008) (to be published)Google Scholar
  9. 9.
    Celeux, G., Diday, E., Govaert, G., Lechevallier, Y., Ralambondrainy, H.: Classification Automatique des Données. Bordas, Paris (1989)Google Scholar
  10. 10.
    Charikar, M., Guha, S., Tardos, A., Shmoys, D.B.: A constant-factor approcimation algorithm for the k-median problem. Journal of Computer and System Sciences 65, 129 (2002)CrossRefGoogle Scholar
  11. 11.
    Conan-Guez, B., Rossi, F.: Speeding up the dissimilarity self-organizing maps by branch and bound. In: Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M. (eds.) IWANN 2007. LNCS, vol. 4507, pp. 203–210. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  12. 12.
    Conan-Guez, B., Rossi, F., El Golli, A.: Fast algorithm and implementation of dissimilarity self-organizing maps. Neural Networks 19(6-7), 855–863 (2006)CrossRefPubMedGoogle Scholar
  13. 13.
    Cottrell, M., Hammer, B., Hasenfuss, A., Villmann, T.: Batch and median neural gas. Neural Networks 19, 762–771 (2006)CrossRefPubMedGoogle Scholar
  14. 14.
    Farnstrom, F., Lewis, J., Elkan, C.: Scalability for clustering algorithms revisited. SIGKDD Explorations 2(1), 51–57 (2000)CrossRefGoogle Scholar
  15. 15.
    Fisher, R.A.: The use of multiple measurements in axonomic problems. Annals of Eugenics 7, 179–188 (1936)CrossRefGoogle Scholar
  16. 16.
    Fort, J.-C., Letrémy, P., Cottrell, M.: Advantages and drawbacks of the batch kohonen algorithm. In: Verleysen, M. (ed.) ESANN 2002, pp. 223–230. D Facto (2002)Google Scholar
  17. 17.
    Frey, B., Dueck, D.: Clustering by passing messages between data points. Science 315, 972–977 (2007)CrossRefPubMedGoogle Scholar
  18. 18.
    Frey, B., Dueck, D.: Response to clustering by passing messages between data points. Science 319, 726d (2008)CrossRefGoogle Scholar
  19. 19.
    Graepel, T., Herbrich, R., Bollmann-Sdorra, P., Obermayer, K.: Classification on pairwise proximity data. In: NIPS, vol. 11, pp. 438–444. MIT Press, Cambridge (1999)Google Scholar
  20. 20.
    Graepel, T., Obermayer, K.: A stochastic self-organizing map for proximity data. Neural Computation 11, 139–155 (1999)CrossRefPubMedGoogle Scholar
  21. 21.
    Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: IEEE Symposium on Foundations of Computer Science, pp. 359–366 (2000)Google Scholar
  22. 22.
    Guha, S., Rastogi, R., Shim, K.: Cure: an efficient clustering algorithm for large datasets. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 73–84 (1998)Google Scholar
  23. 23.
    Haasdonk, B., Bahlmann, C.: Learning with distance substitution kernels. In: Pattern Recognition - Proc. of the 26th DAGM Symposium (2004)Google Scholar
  24. 24.
    Hammer, B., Hasenfuss, A.: Relational neural gas. In: Hertzberg, J., Beetz, M., Englert, R. (eds.) KI 2007. LNCS, vol. 4667, pp. 190–204. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  25. 25.
    Hammer, B., Jain, B.J.: Neural methods for non-standard data. In: Verleysen, M. (ed.) European Symposium on Artificial Neural Networks 2004, pp. 281–292. D-side publications (2004)Google Scholar
  26. 26.
    Hammer, B., Micheli, A., Sperduti, A., Strickert, M.: Recursive self-organizing network models. Neural Networks 17(8-9), 1061–1086 (2004)CrossRefPubMedGoogle Scholar
  27. 27.
    Hammer, B., Villmann, T.: Classification using non standard metrics. In: Verleysen, M. (ed.) ESANN 2005, pp. 303–316. d-side publishing (2005)Google Scholar
  28. 28.
    Hansen, P., Mladenovic, M.: Todo. Location Science 5, 207 (1997)CrossRefGoogle Scholar
  29. 29.
    Hasenfuss, A., Hammer, B.: Single pass clustering and classification of large dissimilarity datasets. In: AIPR (2008)Google Scholar
  30. 30.
    Hathaway, R.J., Bezdek, J.C.: Nerf c-means: Non-euclidean relational fuzzy clustering. Pattern Recognition 27(3), 429–437 (1994)CrossRefGoogle Scholar
  31. 31.
    Hathaway, R.J., Davenport, J.W., Bezdek, J.C.: Relational duals of the c-means algorithms. Pattern Recognition 22, 205–212 (1989)CrossRefGoogle Scholar
  32. 32.
    Heskes, T.: Self-organizing maps, vector quantization, and mixture modeling. IEEE Transactions on Neural Networks 12, 1299–1305 (2001)CrossRefPubMedGoogle Scholar
  33. 33.
    Hofmann, T., Buhmann, J.M.: Pairwise data clustering by deterministic annealing. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(1), 1–14 (1997)CrossRefGoogle Scholar
  34. 34.
    Jin, R., Goswami, A., Agrawal, G.: Fast and exact out-of-core and distributed k-means clustering. Knowledge and Information System 1, 17–40 (2006)CrossRefGoogle Scholar
  35. 35.
    Juan, A., Vidal, E.: On the use of normalized edit distances and an efficient k-nn search technique (k-aesa) for fast and accurate string classification. In: ICPR 2000, vol. 2, pp. 680–683 (2000)Google Scholar
  36. 36.
    Kaski, S., Nikkilä, J., Oja, M., Venna, J., Törönen, P., Castren, E.: Trustworthiness and metrics in visualizing similarity of gene expression. BMC Bioinformatics 4 (2003)Google Scholar
  37. 37.
    Kaski, S., Nikkilä, J., Savia, E., Roos, C.: Discriminative clustering of yeast stress response. In: Seiffert, U., Jain, L., Schweizer, P. (eds.) Bioinformatics using Computational Intelligence Paradigms, pp. 75–92. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  38. 38.
    Kaufman, L., Rousseeuw, P.J.: Clustering by means of medoids. In: Dodge, Y. (ed.) Statistical Data Analysis Based on the L1-Norm and Related Methods, pp. 405–416. North-Holland, Amsterdam (1987)Google Scholar
  39. 39.
    Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (1995)CrossRefGoogle Scholar
  40. 40.
    Kohonen, T.: Self-organizing maps of symbol strings. Technical report A42, Laboratory of computer and information science, Helsinki University of technology, Finland (1996)Google Scholar
  41. 41.
    Kohonen, T., Somervuo, P.: How to make large self-organizing maps for nonvectorial data. Neural Networks 15, 945–952 (2002)CrossRefPubMedGoogle Scholar
  42. 42.
    Land, A.H., Doig, A.G.: An automatic method for solving discrete programming problems. Econometrica 28, 497–520 (1960)CrossRefGoogle Scholar
  43. 43.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 6, 707–710 (1966)Google Scholar
  44. 44.
    Lu, Y., Lu, S., Fotouhi, F., Deng, Y., Brown, S.: Incremental genetic k-means algorithm and its application in gene expression data analysis. BMC Bioinformatics 5, 172 (2004)CrossRefPubMedPubMedCentralGoogle Scholar
  45. 45.
    Lundsteen, C., Phillip, J., Granum, E.: Quantitative analysis of 6985 digitized trypsin G-banded human metaphase chromosomes. Clinical Genetics 18, 355–370 (1980)CrossRefPubMedGoogle Scholar
  46. 46.
    Martinetz, T., Berkovich, S., Schulten, K.: ‘neural-gas’ network for vector quantization and its application to time-series prediction. IEEE Transactions on Neural Networks 4, 558–569 (1993)CrossRefPubMedGoogle Scholar
  47. 47.
    Martinetz, T., Schulten, K.: Topology representing networks. Neural Networks 7(507-522) (1994)Google Scholar
  48. 48.
    Mevissen, H., Vingron, M.: Quantifying the local reliability of a sequence alignment. Protein Engineering 9, 127–132 (1996)CrossRefPubMedGoogle Scholar
  49. 49.
    Neuhaus, M., Bunke, H.: Edit distance-based kernel functions for structural pattern classification. Pattern Recognition 39(10), 1852–1863 (2006)CrossRefGoogle Scholar
  50. 50.
    Bradley, P.S., Fayyad, U., Reina, C.: Scaling clustering algorithms to large data sets. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 9–15. AAAI Press, Menlo Park (1998)Google Scholar
  51. 51.
    Qin, A.K., Suganthan, P.N.: Kernel neural gas algorithms with application to cluster analysis. In: ICPR 2004, vol. 4, pp. 617–620 (2004)Google Scholar
  52. 52.
    Rossi, F.: Model collisions in the dissimilarity SOM. In: Proceedings of XVth European Symposium on Artificial Neural Networks (ESANN 2007), Bruges (Belgium), pp. 25–30 (April 2007)Google Scholar
  53. 53.
    Shamir, R., Sharan, R.: Approaches to clustering gene expression data. In: Jiang, T., Smith, T., Xu, Y., Zhang, M.Q. (eds.) Current Topics in Computational Biology. MIT Press, Cambridge (2001)Google Scholar
  54. 54.
    Villmann, T., Seiffert, U., Schleif, F.-M., Brüß, C., Geweniger, T., Hammer, B.: Fuzzy labeled self-organizing map with label-adjusted prototypes. In: Schwenker, F., Marinai, S. (eds.) ANNPR 2006. LNCS, vol. 4087, pp. 46–56. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  55. 55.
    Wang, W., Yang, J., Muntz, R.: Sting: a statistical information grid approach to spatial data mining. In: Proceedings of the 23rd VLDB Conference, pp. 186–195 (1997)Google Scholar
  56. 56.
    Wolberg, W., Street, W., Heisey, D., Mangasarian, O.: Computer-derived nuclear features distinguish malignant from benign breast cytology. Human Pathology 26, 792–796 (1995)CrossRefPubMedGoogle Scholar
  57. 57.
    Yang, Q., Wu, X.: 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making 5(4), 597–604 (2006)CrossRefGoogle Scholar
  58. 58.
    Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. In: Proceedings of the 15th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Databas Systems, pp. 103–114 (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Barbara Hammer
    • 1
  • Alexander Hasenfuss
    • 1
  • Fabrice Rossi
    • 2
  1. 1.Clausthal University of TechnologyClausthal-ZellerfeldGermany
  2. 2.INRIA Rocquencourt, Domaine de Voluceau, RocquencourtLe Chesnay CedexFrance

Personalised recommendations