Skip to main content

A Parametric Version of Probabilistic Distance Clustering

  • Conference paper
  • First Online:
Statistical Learning of Complex Data (CLADAG 2017)

Abstract

Probabilistic distance (PD) clustering method grounds on the basic assumption that the product between the probability of the unit belonging to a cluster and the distance between the unit and the cluster center is constant, for each statistical unit. This constant is a measure of the classificability of the point, and the sum of the constant over units is referred to as the joint distance function (JDF). The parameters that minimize the JDF maximize the classificability of the units. The goal of this paper is to introduce a new distance measure based on a probability density function, specifically, we use the multivariate Gaussian and Student-t distributions. We show using two simulated data sets that the use of a distance based on these two density functions improves the performance of PD clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Andrews, J.L., Wickins, J.R., Boers, N.M., McNicholas, P.D.: teigen: an R package for model-based clustering and classification via the multivariate t distribution. J. Stat. Softw. 83, 1–32 (2017)

    Google Scholar 

  2. Ben-Israel, A., Iyigun, C.: Probabilistic d-clustering. J. Classif. 25, 5–26 (2008)

    Article  MathSciNet  Google Scholar 

  3. Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10, 191–203 (1984)

    Article  Google Scholar 

  4. Browne, R.P., ElSherbiny, A., McNicholas, P.D.: FCM: mixture: Mixture Models for Clustering and Classification. R package version 1.4 (2015). https://cran.r-project.org/web/packages/mixture/index.html

  5. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B-met Ser. B 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  6. Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Cluster Analysis. Wiley Series in Probability and Statistics. Wiley, New York (2011)

    Book  Google Scholar 

  7. Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., Hothorn, T.: mvtnorm: multivariate normal and t distributions. R package version 1.0-7 (2009). https://cran.r-project.org/web/packages/mvtnorm/index.html

  8. Gordon, A.D.: Classification, 2nd edn. Chapman and Hall/CRC, Boca Raton (1999)

    MATH  Google Scholar 

  9. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)

    Article  Google Scholar 

  10. Iyigun, C.: Probabilistic distance clustering. Ph.D. thesis, State University of New Jersey (2007)

    Google Scholar 

  11. Iyigun, C., Ben-Israel, A.: Probabilistic distance clustering adjusted for cluster size. Probab. Eng. Inform. Sci. 22, 68–125 (2008)

    Article  MathSciNet  Google Scholar 

  12. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium, vol. 1, pp. 281–297 (1967)

    MathSciNet  MATH  Google Scholar 

  13. McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley Interscience, New York (2000)

    Book  Google Scholar 

  14. R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (2016)

    Google Scholar 

  15. Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971)

    Article  Google Scholar 

  16. Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 2nd edn. Academic Press, New York (2003)

    MATH  Google Scholar 

  17. Tortora, C., McNicholas, P.D.: FPDclustering: PD-clustering and factor PD-clustering. R package version 1.1 (2016). https://cran.r-project.org/web/packages/FPDclustering/index.html

  18. Tortora, C., Gettler-Summa, M., Marino, M., Palumbo, F.: Factor probabilistic distance clustering (FPDC): a new clustering method. Adv. Data Anal. Classif. 10, 441–464 (2016)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors are very grateful to the two anonymous referees for their detailed and helpful comments to finalize the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Palumbo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rainey, C., Tortora, C., Palumbo, F. (2019). A Parametric Version of Probabilistic Distance Clustering. In: Greselin, F., Deldossi, L., Bagnato, L., Vichi, M. (eds) Statistical Learning of Complex Data. CLADAG 2017. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-030-21140-0_4

Download citation

Publish with us

Policies and ethics