Structure Preserving Embedding of Dissimilarity Data

  • Volker RothEmail author
  • Thomas J. Fuchs
  • Julia E. Vogt
  • Sandhya Prabhakaran
  • Joachim M. Buhmann
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)


Partitioning methods for observations represented by pairwise dissimilarities are studied. Particular emphasis is put on their properties when applied to dissimilarity matrices that do not admit a loss-free embedding into a vector space. Specifically, the Pairwise Clustering cost function is shown to exhibit a shift invariance property which basically means that any symmetric dissimilarity matrix can be modified to allow a vector-space representation without distorting the optimal group structure. In an approximate sense, the same holds true for a probabilistic generalization of Pairwise Clustering, the so-called Wishart–Dirichlet Cluster Process. This shift-invariance property essentially means that these clustering methods are “blind” against Euclidean or metric violations. From the application side, such blindness against metric violations might be seen as a highly desired feature, since it broadens the applicability of certain algorithms. From the viewpoint of theory building, however, the same property might be viewed as a “negative” result, since studying these algorithms will not lead to any new insights on the role of metricity in clustering problems.


Cost Function Affinity Propagation Dissimilarity Matrix Wishart Distribution Shift Invariance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986) CrossRefGoogle Scholar
  2. 2.
    Müller, K.-R., Mika, S., Rätsch, G., Tsuda, K., Schölkopf, B.: An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001) CrossRefGoogle Scholar
  3. 3.
    Roth, V., Laub, J., Kawanabe, M., Buhmann, J.M.: Optimal cluster preserving embedding of non-metric proximity data. IEEE Trans. Pattern Anal. Mach. Intell. 25(12) (2003) Google Scholar
  4. 4.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001) zbMATHGoogle Scholar
  5. 5.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999) CrossRefGoogle Scholar
  6. 6.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000) CrossRefGoogle Scholar
  7. 7.
    Puzicha, J., Hofmann, T., Buhmann, J.: A theory of proximity based clustering: structure detection by optimization. Pattern Recognit. 33(4), 617–634 (1999) CrossRefGoogle Scholar
  8. 8.
    Hofmann, T., Buhmann, J.: Pairwise data clustering by deterministic annealing. IEEE Trans. Pattern Anal. Mach. Intell. 19(1), 1–14 (1997) CrossRefGoogle Scholar
  9. 9.
    Brucker, P.: On the complexity of clustering problems. In: Beckman, M., Kunzi, H.P. (eds.) Optimization and Operations Research: Lecture Notes in Economics and Mathematical Systems, pp. 45–54. Springer, Berlin (1978) CrossRefGoogle Scholar
  10. 10.
    Torgerson, W.S.: Theory and Methods of Scaling. Wiley, New York (1958) Google Scholar
  11. 11.
    Young, G., Householder, A.S.: Discussion of a set of points in terms of their mutual distances. Psychometrika 3, 19–22 (1938) CrossRefzbMATHGoogle Scholar
  12. 12.
    Schölkopf, B., Smola, A., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998) CrossRefGoogle Scholar
  13. 13.
    McCullagh, P., Yang, J.: How many clusters? Bayesian Anal. 3, 101–120 (2008) MathSciNetCrossRefGoogle Scholar
  14. 14.
    Vogt, J., Prabhakaran, S., Fuchs, T., Roth, V.: The translation-invariant Wishart–Dirichlet process for clustering distance data. In: Proceedings of the 27th International Conference on Machine Learning (2010) Google Scholar
  15. 15.
    Prabhakaran, S., Boehm, A., Metzner, K.J., Roth, V.: Recovering networks from distance data. J. Mach. Learn. Res. 25, 349–364 (2012) Google Scholar
  16. 16.
    Pitman, J.: Combinatorial stochastic processes. In: Picard, J. (ed.) Ecole d’Ete de Probabilites de Saint-Flour XXXII-2002. Springer, Berlin (2006) Google Scholar
  17. 17.
    MacEachern, S.N.: Estimating normal means with a conjugate-style Dirichlet process prior. Commun. Stat., Simul. Comput. 23, 727–741 (1994) MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Dahl, D.B.: Sequentially-allocated merge-split sampler for conjugate and non-conjugate Dirichlet process mixture models. Technical report, Department of Statistics, Texas A&M University (2005) Google Scholar
  19. 19.
    Ewens, W.: The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3, 87–112 (1972) MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 9, 249–265 (2000) MathSciNetGoogle Scholar
  21. 21.
    Blei, D., Jordan, M.: Variational inference for Dirichlet process mixtures. Bayesian Anal. 1, 121–144 (2006) MathSciNetCrossRefGoogle Scholar
  22. 22.
    Srivastava, M.S.: Singular Wishart and multivariate beta distributions. Ann. Stat. (2003) Google Scholar
  23. 23.
    McCullagh, P.: Marginal likelihood for distance matrices. Stat. Sin. 19, 631–649 (2009) MathSciNetzbMATHGoogle Scholar
  24. 24.
    Cox, T.F., Cox, M.A.A.: Multidimensional Scaling. Chapman & Hall, London (2001) zbMATHGoogle Scholar
  25. 25.
    Roth, V., Laub, J., Buhmann, J.M., Müller, K.-R.: Going metric: denoising pairwise data. In: Thrun, S., Becker, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, vol. 15, pp. 817–824. MIT Press, Cambridge (2003) Google Scholar
  26. 26.
    Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315, 972–976 (2007) MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Tannapfel, A., Hahn, H.A., Katalinic, A., Fietkau, R.J., Kühn, R., Wittekind, C.W.: Prognostic value of ploidy and proliferation markers in renal cell carcinoma. Cancer 77(1), 164–171 (1996) CrossRefGoogle Scholar
  28. 28.
    Fuchs, T.J., Wild, P.J., Moch, H., Buhmann, J.M.: Computational pathology analysis of tissue microarrays predicts survival of renal clear cell carcinoma patients. In: Medical Image Computing and Computer-Assisted Intervention. MICCAI 2008. Lecture Notes in Computer Science, vol. 5242, pp. 1–8. Springer, Berlin (2008) CrossRefGoogle Scholar
  29. 29.
    Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001) CrossRefzbMATHGoogle Scholar
  30. 30.
    Ahonen, T., Hadid, A., Pietikainen, M.: Face recognition with local binary patterns. In: ECCV 2004, vol. 3021, pp. 469–481 (2004) CrossRefGoogle Scholar
  31. 31.
    Buhmann, J.M.: Information theoretic model validation for clustering. In: International Symposium on Information Theory, Austin Texas, pp. 1398–1402. IEEE Press, New York (2010). doi: 10.1109/ISIT.2010.5513616 Google Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  • Volker Roth
    • 1
    Email author
  • Thomas J. Fuchs
    • 2
  • Julia E. Vogt
    • 1
  • Sandhya Prabhakaran
    • 1
  • Joachim M. Buhmann
    • 3
  1. 1.Computer Science DepartmentUniversity of BaselBaselSwitzerland
  2. 2.California Institute of TechnologyPasadenaUSA
  3. 3.Swiss Federal Institute of Technology ZurichZurichSwitzerland

Personalised recommendations