Advertisement

A Parametric Version of Probabilistic Distance Clustering

  • Christopher Rainey
  • Cristina Tortora
  • Francesco PalumboEmail author
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

Probabilistic distance (PD) clustering method grounds on the basic assumption that the product between the probability of the unit belonging to a cluster and the distance between the unit and the cluster center is constant, for each statistical unit. This constant is a measure of the classificability of the point, and the sum of the constant over units is referred to as the joint distance function (JDF). The parameters that minimize the JDF maximize the classificability of the units. The goal of this paper is to introduce a new distance measure based on a probability density function, specifically, we use the multivariate Gaussian and Student-t distributions. We show using two simulated data sets that the use of a distance based on these two density functions improves the performance of PD clustering.

Keywords

PD clustering Clustering algorithm Gaussian distribution Multivariate Student-t distribution 

Notes

Acknowledgements

The authors are very grateful to the two anonymous referees for their detailed and helpful comments to finalize the manuscript.

References

  1. 1.
    Andrews, J.L., Wickins, J.R., Boers, N.M., McNicholas, P.D.: teigen: an R package for model-based clustering and classification via the multivariate t distribution. J. Stat. Softw. 83, 1–32 (2017)Google Scholar
  2. 2.
    Ben-Israel, A., Iyigun, C.: Probabilistic d-clustering. J. Classif. 25, 5–26 (2008)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10, 191–203 (1984)CrossRefGoogle Scholar
  4. 4.
    Browne, R.P., ElSherbiny, A., McNicholas, P.D.: FCM: mixture: Mixture Models for Clustering and Classification. R package version 1.4 (2015). https://cran.r-project.org/web/packages/mixture/index.html
  5. 5.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B-met Ser. B 39, 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Cluster Analysis. Wiley Series in Probability and Statistics. Wiley, New York (2011)CrossRefGoogle Scholar
  7. 7.
    Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., Hothorn, T.: mvtnorm: multivariate normal and t distributions. R package version 1.0-7 (2009). https://cran.r-project.org/web/packages/mvtnorm/index.html
  8. 8.
    Gordon, A.D.: Classification, 2nd edn. Chapman and Hall/CRC, Boca Raton (1999)zbMATHGoogle Scholar
  9. 9.
    Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)CrossRefGoogle Scholar
  10. 10.
    Iyigun, C.: Probabilistic distance clustering. Ph.D. thesis, State University of New Jersey (2007)Google Scholar
  11. 11.
    Iyigun, C., Ben-Israel, A.: Probabilistic distance clustering adjusted for cluster size. Probab. Eng. Inform. Sci. 22, 68–125 (2008)MathSciNetCrossRefGoogle Scholar
  12. 12.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium, vol. 1, pp. 281–297 (1967)MathSciNetzbMATHGoogle Scholar
  13. 13.
    McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley Interscience, New York (2000)CrossRefGoogle Scholar
  14. 14.
    R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (2016)Google Scholar
  15. 15.
    Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971)CrossRefGoogle Scholar
  16. 16.
    Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 2nd edn. Academic Press, New York (2003)zbMATHGoogle Scholar
  17. 17.
    Tortora, C., McNicholas, P.D.: FPDclustering: PD-clustering and factor PD-clustering. R package version 1.1 (2016). https://cran.r-project.org/web/packages/FPDclustering/index.html
  18. 18.
    Tortora, C., Gettler-Summa, M., Marino, M., Palumbo, F.: Factor probabilistic distance clustering (FPDC): a new clustering method. Adv. Data Anal. Classif. 10, 441–464 (2016)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Christopher Rainey
    • 1
  • Cristina Tortora
    • 1
  • Francesco Palumbo
    • 2
    Email author
  1. 1.Department of Mathematics and StatisticsSan José State UniversitySan JoseUSA
  2. 2.Department of Political SciencesUniversity of Naples Federico IINapoliItaly

Personalised recommendations