Advertisement

Factor PD-Clustering

  • Cristina Tortora
  • Mireille Gettler Summa
  • Francesco Palumbo
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

Probabilistic Distance (PD) Clustering is a non parametric probabilistic method to find homogeneous groups in multivariate datasets with J variables and n units. PD Clustering runs on an iterative algorithm and looks for a set of K group centers, maximising the empirical probabilities of belonging to a cluster of the n statistical units. As J becomes large the solution tends to become unstable. This paper extends the PD-Clustering to the context of Factorial clustering methods and shows that Tucker3 decomposition is a consistent transformation to project original data in a subspace defined according to the same PD-Clustering criterion. The method consists of a two step iterative procedure: a linear transformation of the initial data and PD-clustering on the transformed data. The integration of the PD Clustering and the Tucker3 factorial step makes the clustering more stable and lets us consider datasets with large J and let us use it in case of clusters not having elliptical form.

References

  1. Ben-Israel, A., & Iyigun, C. (2008). Probabilistic d-clustering. Journal of Classification, 25(1), 5–26.MathSciNetMATHCrossRefGoogle Scholar
  2. Iyigun, C. (2007). Probabilistic distance clustering. Ph.D. thesis, New Brunswick Rutgers, The State University of New Jersey.Google Scholar
  3. Jain, A. K. (2009). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31, 651–666.CrossRefGoogle Scholar
  4. Kiers, H., & Kinderen, A. (2003). A fast method for choosing the numbers of components in tucker3 analysis. British Journal of Mathematical and Statistical Psychology, 56(1), 119–125.MathSciNetCrossRefGoogle Scholar
  5. Kroonenberg, P. (2008). Applied multiway data analysis. Ebooks Corporation, Baarn, Nederland.MATHCrossRefGoogle Scholar
  6. Menardi, G. (2011). Density-based Silhouette diagnostics for clustering methods. Statistics and Computing, 21, 295–308.MathSciNetMATHCrossRefGoogle Scholar
  7. Montanari, A., & Viroli, C. (2011). Maximum likelihood estimation of mixtures of factor analyzers. Computational Statistics and Data Analysis, 55, 2712–2723.MathSciNetCrossRefGoogle Scholar
  8. Parsons, L., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: A review SIGKDD Explorations Newsletter, 6, 90–105.Google Scholar
  9. Tortora, C. (2011). Non-hierarchical clustering methods on factorial subspaces. Ph.D. thesis at Universitá di Napoli Federico II, Naples.Google Scholar
  10. Tortora, C., Palumbo, F., & Gettler Summa, M. (2011). Factorial PD-clustering. Working paper. arXiv:1106.3830v1.Google Scholar
  11. Vichi, M., & Kiers, H. (2001). Factorial k-means analysis for two way data. Computational Statistics and Data Analysis, 37, 29–64.MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Cristina Tortora
    • 1
    • 2
  • Mireille Gettler Summa
    • 2
  • Francesco Palumbo
    • 1
  1. 1.Università degli Studi di Napoli Federico IINaplesItaly
  2. 2.CEREMADE, CNRSUniversité Paris DauphineParisFrance

Personalised recommendations