Clustering Pairwise Distances with Missing Data: Maximum Cuts Versus Normalized Cuts

  • Jan Poland
  • Thomas Zeugmann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4265)

Abstract

Clustering algorithms based on a matrix of pairwise similarities (kernel matrix) for the data are widely known and used, a particularly popular class being spectral clustering algorithms. In contrast, algorithms working with the pairwise distance matrix have been studied rarely for clustering. This is surprising, as in many applications, distances are directly given, and computing similarities involves another step that is error-prone, since the kernel has to be chosen appropriately, albeit computationally cheap. This paper proposes a clustering algorithm based on the SDP relaxation of the max-k-cut of the graph of pairwise distances, based on the work of Frieze and Jerrum. We compare the algorithm with Yu and Shi’s algorithm based on spectral relaxation of a norm-k-cut. Moreover, we propose a simple heuristic for dealing with missing data, i.e., the case where some of the pairwise distances or similarities are not known. We evaluate the algorithms on the task of clustering natural language terms with the Google distance, a semantic distance recently introduced by Cilibrasi and Vitányi, using relative frequency counts from WWW queries and based on the theory of Kolmogorov complexity.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Borchers, B., Young, J.G.: Implementation of a primal-dual method for sdp on a shared memory parallel architecture (March 27, 2006)Google Scholar
  2. 2.
    Cilibrasi, R., Vitányi, P.M.B.: Automatic meaning discovery using Google. CWI, Amsterdam (Manuscript, 2006)Google Scholar
  3. 3.
    Ding, C.H.Q., He, X., Zha, H., Gu, M., Simon, H.D.: A min-max cut algorithm for graph partitioning and data clustering. In: ICDM 2001: Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 107–114. IEEE Computer Society, Los Alamitos (2001)CrossRefGoogle Scholar
  4. 4.
    Frieze, A., Jerrum, M.: Improved algorithms for max k-cut and max bisection. Algorithmica 18, 67–81 (1997)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Goemans, M.X., Williamson, D.P.: 879-approximation algorithms for MAX CUT and MAX 2SAT. In: STOC 1994: Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, pp. 422–431. ACM Press, New York (1994)CrossRefGoogle Scholar
  6. 6.
    Graepel, T.: Kernel matrix completion by semidefinite programming. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, pp. 694–699. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    Lanckriet, G.R.G., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. JMLR 5, 27–72 (2004)Google Scholar
  8. 8.
    Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.M.B.: The similarity metric. IEEE Transactions on Information Theory 50(12), 3250–3264 (2004)CrossRefGoogle Scholar
  9. 9.
    Schölkopf, B., Smola, A., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5), 1299–1319 (1998)CrossRefGoogle Scholar
  10. 10.
    Sturm, J.: Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods and Software 11(12), 625–653 (1999)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Xing, E.P., Jordan, M.I.: On semidefinite relaxation for normalized k-cut and connections to spectral clustering. Technical Report UCB/CSD-03-1265, EECS Department, University of California, Berkeley (2003)Google Scholar
  12. 12.
    Yu, S.X., Shi, J.: Multiclass spectral clustering. In: ICCV 2003: Proceedings of the Ninth IEEE International Conference on Computer Vision, pp. 313–319. IEEE Computer Society, Los Alamitos (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jan Poland
    • 1
  • Thomas Zeugmann
    • 1
  1. 1.Division of Computer ScienceHokkaido UniversitySapporoJapan

Personalised recommendations