Probabilistic Distance (PD) Clustering is a non parametric probabilistic method to find homogeneous groups in multivariate datasets with J variables and n units. PD Clustering runs on an iterative algorithm and looks for a set of K group centers, maximising the empirical probabilities of belonging to a cluster of the n statistical units. As J becomes large the solution tends to become unstable. This paper extends the PD-Clustering to the context of Factorial clustering methods and shows that Tucker3 decomposition is a consistent transformation to project original data in a subspace defined according to the same PD-Clustering criterion. The method consists of a two step iterative procedure: a linear transformation of the initial data and PD-clustering on the transformed data. The integration of the PD Clustering and the Tucker3 factorial step makes the clustering more stable and lets us consider datasets with large J and let us use it in case of clusters not having elliptical form.
- Iyigun, C. (2007). Probabilistic distance clustering. Ph.D. thesis, New Brunswick Rutgers, The State University of New Jersey.Google Scholar
- Parsons, L., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: A review SIGKDD Explorations Newsletter, 6, 90–105.Google Scholar
- Tortora, C. (2011). Non-hierarchical clustering methods on factorial subspaces. Ph.D. thesis at Universitá di Napoli Federico II, Naples.Google Scholar
- Tortora, C., Palumbo, F., & Gettler Summa, M. (2011). Factorial PD-clustering. Working paper. arXiv:1106.3830v1.Google Scholar