A Parametric Version of Probabilistic Distance Clustering
Probabilistic distance (PD) clustering method grounds on the basic assumption that the product between the probability of the unit belonging to a cluster and the distance between the unit and the cluster center is constant, for each statistical unit. This constant is a measure of the classificability of the point, and the sum of the constant over units is referred to as the joint distance function (JDF). The parameters that minimize the JDF maximize the classificability of the units. The goal of this paper is to introduce a new distance measure based on a probability density function, specifically, we use the multivariate Gaussian and Student-t distributions. We show using two simulated data sets that the use of a distance based on these two density functions improves the performance of PD clustering.
KeywordsPD clustering Clustering algorithm Gaussian distribution Multivariate Student-t distribution
The authors are very grateful to the two anonymous referees for their detailed and helpful comments to finalize the manuscript.
- 1.Andrews, J.L., Wickins, J.R., Boers, N.M., McNicholas, P.D.: teigen: an R package for model-based clustering and classification via the multivariate t distribution. J. Stat. Softw. 83, 1–32 (2017)Google Scholar
- 4.Browne, R.P., ElSherbiny, A., McNicholas, P.D.: FCM: mixture: Mixture Models for Clustering and Classification. R package version 1.4 (2015). https://cran.r-project.org/web/packages/mixture/index.html
- 7.Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., Hothorn, T.: mvtnorm: multivariate normal and t distributions. R package version 1.0-7 (2009). https://cran.r-project.org/web/packages/mvtnorm/index.html
- 10.Iyigun, C.: Probabilistic distance clustering. Ph.D. thesis, State University of New Jersey (2007)Google Scholar
- 14.R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (2016)Google Scholar
- 17.Tortora, C., McNicholas, P.D.: FPDclustering: PD-clustering and factor PD-clustering. R package version 1.1 (2016). https://cran.r-project.org/web/packages/FPDclustering/index.html