Advertisement

Unsupervised Learning: Self-aggregation in Scaled Principal Component Space*

  • Chris Ding
  • Xiaofeng He
  • Hongyuan Zha
  • Horst Simon
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2431)

Abstract

We demonstrate that data clustering amounts to a dynamic process of self-aggregation in which data objects move towards each other to form clusters, revealing the inherent pattern of similarity. Selfaggregation is governed by connectivity and occurs in a space obtained by a nonlinear scaling of principal component analysis (PCA). The method combines dimensionality reduction with clustering into a single framework. It can apply to both square similarity matrices and rectangular association matrices.

Keywords

Bipartite Graph Data Object Cluster Structure Unsupervised Learn News Article 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    A. A. Alizadeh, M. B. Eisen, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403:503–511, 2000.CrossRefGoogle Scholar
  2. 2.
    C. Ding, X. He, and H. Zha. A spectral method to separate disconnected and nearly-disconnected web graph components. In Proc. ACM Int’l Conf Knowledge Disc. Data Mining (KDD 2001), pages 275–280.Google Scholar
  3. 3.
    C. Ding, X. He, H. Zha, M. Gu, and H. Simon. A min-max cut algorithm for graph partitioning and data clustering. Proc. 1st IEEE Int’l Conf. Data Mining, pages 107–114, 2001.Google Scholar
  4. 4.
    W. E. Donath and A. J. Hoffman. Lower bounds for partitioning of graphs. IBM J. Res. Develop., 17:420–425, 1973.zbMATHMathSciNetCrossRefGoogle Scholar
  5. 5.
    R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification, 2nd ed. Wiley, 2000.Google Scholar
  6. 6.
    M. Fiedler. Algebraic connectivity of graphs. Czech. Math. J., 23:298–305, 1973.MathSciNetGoogle Scholar
  7. 7.
    M. J. Greenacre. Theory and Applications of Correspondence Analysis. Academic press, 1984.Google Scholar
  8. 8.
    L. Hagen and A. B. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE. Trans. on Computed Aided Desgin, 11:1074–1085, 1992.CrossRefGoogle Scholar
  9. 9.
    T. Hastie and W. Stuetzle. Principal curves. J. Amer. Stat. Assoc, 84:502–516, 1989.zbMATHMathSciNetCrossRefGoogle Scholar
  10. 10.
    S. S. Haykin. Neural Networks: A Comprehensive Foundation. Prentice Hall, 1998, 2nd ed.Google Scholar
  11. 11.
    J. Himberg. A som based cluster visualization and its application for false coloring. Proc Int’l Joint Conf. Neural Networks, pages 587–592, 2000.Google Scholar
  12. 12.
    J. J. Hopfield. Neural networks and physical systems with emergent collective computation abilities. Proc. Nat’l Acad Sci USA, 79:2554–2558, 1982.MathSciNetCrossRefGoogle Scholar
  13. 13.
    A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Computing Surveys, 31:264–323, 1999.CrossRefGoogle Scholar
  14. 14.
    I. T. Jolliffe. Principal Component Analysis. Springer Verlag, 1986.Google Scholar
  15. 15.
    T. Kohonen. Self-organization and Associative Memory. Springer-Verlag, 1989.Google Scholar
  16. 16.
    M. A. Kramer. Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal, 37:233–243, 1991.CrossRefGoogle Scholar
  17. 17.
    D. D. Lee and H. S. Seung. Learning the parts of objects with nonnegative matrix factorization. Nature, 401:788–791, 1999.CrossRefGoogle Scholar
  18. 18.
    A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. Proc. Neural Info. Processing Systems (NIPS 2001), Dec. 2001.Google Scholar
  19. 19.
    A. Pothen, H. D. Simon, and K. P. Liou. Partitioning sparse matrices with egenvectors of graph. SIAM Journal of Matrix Anal. Appl., 11:430–452, 1990.zbMATHMathSciNetCrossRefGoogle Scholar
  20. 20.
    S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323–2326, 2000.CrossRefGoogle Scholar
  21. 21.
    B. Scholkopf, A. Smola, and K. Muller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10:1299–1319, 1998.CrossRefGoogle Scholar
  22. 22.
    J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE. Trans. on Pattern Analysis and Machine Intelligence, 22:888–905, 2000.CrossRefGoogle Scholar
  23. 23.
    J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319–2323, 2000.CrossRefGoogle Scholar
  24. 24.
    H. Zha, C. Ding, M. Gu, X. He, and H. D. Simon. Spectral relaxation for k-means clustering. Proc. Neural Info. Processing Systems (NIPS 2001), Dec. 2001.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Chris Ding
  • Xiaofeng He
    • 1
  • Hongyuan Zha
    • 2
  • Horst Simon
    • 1
  1. 1.NERSC DivisionLawrence Berkeley National Laboratory University of CaliforniaBerkeley
  2. 2.Department of Computer Science and EngineeringPennsylvania State UniversityUniversity Park

Personalised recommendations