A General Framework for Increasing the Robustness of PCA-Based Correlation Clustering Algorithms

  • Hans-Peter Kriegel
  • Peer Kröger
  • Erich Schubert
  • Arthur Zimek
Conference paper

DOI: 10.1007/978-3-540-69497-7_27

Part of the Lecture Notes in Computer Science book series (LNCS, volume 5069)
Cite this paper as:
Kriegel HP., Kröger P., Schubert E., Zimek A. (2008) A General Framework for Increasing the Robustness of PCA-Based Correlation Clustering Algorithms. In: Ludäscher B., Mamoulis N. (eds) Scientific and Statistical Database Management. SSDBM 2008. Lecture Notes in Computer Science, vol 5069. Springer, Berlin, Heidelberg

Abstract

Most correlation clustering algorithms rely on principal component analysis (PCA) as a correlation analysis tool. The correlation of each cluster is learned by applying PCA to a set of sample points. Since PCA is rather sensitive to outliers, if a small fraction of these points does not correspond to the correct correlation of the cluster, the algorithms are usually misled or even fail to detect the correct results. In this paper, we evaluate the influence of outliers on PCA and propose a general framework for increasing the robustness of PCA in order to determine the correct correlation of each cluster. We further show how our framework can be applied to PCA-based correlation clustering algorithms. A thorough experimental evaluation shows the benefit of our framework on several synthetic and real-world data sets.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Hans-Peter Kriegel
    • 1
  • Peer Kröger
    • 1
  • Erich Schubert
    • 1
  • Arthur Zimek
    • 1
  1. 1.Institute for InformaticsLudwig-Maximilians-Universität München 

Personalised recommendations