Abstract
Probabilistic distance (PD) clustering method grounds on the basic assumption that the product between the probability of the unit belonging to a cluster and the distance between the unit and the cluster center is constant, for each statistical unit. This constant is a measure of the classificability of the point, and the sum of the constant over units is referred to as the joint distance function (JDF). The parameters that minimize the JDF maximize the classificability of the units. The goal of this paper is to introduce a new distance measure based on a probability density function, specifically, we use the multivariate Gaussian and Student-t distributions. We show using two simulated data sets that the use of a distance based on these two density functions improves the performance of PD clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Andrews, J.L., Wickins, J.R., Boers, N.M., McNicholas, P.D.: teigen: an R package for model-based clustering and classification via the multivariate t distribution. J. Stat. Softw. 83, 1–32 (2017)
Ben-Israel, A., Iyigun, C.: Probabilistic d-clustering. J. Classif. 25, 5–26 (2008)
Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10, 191–203 (1984)
Browne, R.P., ElSherbiny, A., McNicholas, P.D.: FCM: mixture: Mixture Models for Clustering and Classification. R package version 1.4 (2015). https://cran.r-project.org/web/packages/mixture/index.html
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B-met Ser. B 39, 1–38 (1977)
Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Cluster Analysis. Wiley Series in Probability and Statistics. Wiley, New York (2011)
Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., Hothorn, T.: mvtnorm: multivariate normal and t distributions. R package version 1.0-7 (2009). https://cran.r-project.org/web/packages/mvtnorm/index.html
Gordon, A.D.: Classification, 2nd edn. Chapman and Hall/CRC, Boca Raton (1999)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Iyigun, C.: Probabilistic distance clustering. Ph.D. thesis, State University of New Jersey (2007)
Iyigun, C., Ben-Israel, A.: Probabilistic distance clustering adjusted for cluster size. Probab. Eng. Inform. Sci. 22, 68–125 (2008)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium, vol. 1, pp. 281–297 (1967)
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley Interscience, New York (2000)
R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (2016)
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 2nd edn. Academic Press, New York (2003)
Tortora, C., McNicholas, P.D.: FPDclustering: PD-clustering and factor PD-clustering. R package version 1.1 (2016). https://cran.r-project.org/web/packages/FPDclustering/index.html
Tortora, C., Gettler-Summa, M., Marino, M., Palumbo, F.: Factor probabilistic distance clustering (FPDC): a new clustering method. Adv. Data Anal. Classif. 10, 441–464 (2016)
Acknowledgements
The authors are very grateful to the two anonymous referees for their detailed and helpful comments to finalize the manuscript.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Rainey, C., Tortora, C., Palumbo, F. (2019). A Parametric Version of Probabilistic Distance Clustering. In: Greselin, F., Deldossi, L., Bagnato, L., Vichi, M. (eds) Statistical Learning of Complex Data. CLADAG 2017. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-030-21140-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-21140-0_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21139-4
Online ISBN: 978-3-030-21140-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)