A Parametric Version of Probabilistic Distance Clustering

Rainey, Christopher; Tortora, Cristina; Palumbo, Francesco

doi:10.1007/978-3-030-21140-0_4

Christopher Rainey²¹,
Cristina Tortora²¹ &
Francesco Palumbo²²

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Included in the following conference series:

Scientific Meeting of the Classification and Data Analysis Group of the Italian Statistical Society

1100 Accesses
2 Citations

Abstract

Probabilistic distance (PD) clustering method grounds on the basic assumption that the product between the probability of the unit belonging to a cluster and the distance between the unit and the cluster center is constant, for each statistical unit. This constant is a measure of the classificability of the point, and the sum of the constant over units is referred to as the joint distance function (JDF). The parameters that minimize the JDF maximize the classificability of the units. The goal of this paper is to introduce a new distance measure based on a probability density function, specifically, we use the multivariate Gaussian and Student-t distributions. We show using two simulated data sets that the use of a distance based on these two density functions improves the performance of PD clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Probabilistic Distance Clustering Algorithm Using Gaussian and Student-t Multivariate Density Distributions

Article 07 February 2020

FPDclustering: a comprehensive R package for probabilistic distance clustering based methods

Article Open access 15 May 2024

A Simple Clustering Algorithm Based on Weighted Expected Distances

References

Andrews, J.L., Wickins, J.R., Boers, N.M., McNicholas, P.D.: teigen: an R package for model-based clustering and classification via the multivariate t distribution. J. Stat. Softw. 83, 1–32 (2017)
Google Scholar
Ben-Israel, A., Iyigun, C.: Probabilistic d-clustering. J. Classif. 25, 5–26 (2008)
Article MathSciNet Google Scholar
Bezdek, J.C., Ehrlich, R., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10, 191–203 (1984)
Article Google Scholar
Browne, R.P., ElSherbiny, A., McNicholas, P.D.: FCM: mixture: Mixture Models for Clustering and Classification. R package version 1.4 (2015). https://cran.r-project.org/web/packages/mixture/index.html
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B-met Ser. B 39, 1–38 (1977)
MathSciNet MATH Google Scholar
Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Cluster Analysis. Wiley Series in Probability and Statistics. Wiley, New York (2011)
Book Google Scholar
Genz, A., Bretz, F., Miwa, T., Mi, X., Leisch, F., Scheipl, F., Hothorn, T.: mvtnorm: multivariate normal and t distributions. R package version 1.0-7 (2009). https://cran.r-project.org/web/packages/mvtnorm/index.html
Gordon, A.D.: Classification, 2nd edn. Chapman and Hall/CRC, Boca Raton (1999)
MATH Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Article Google Scholar
Iyigun, C.: Probabilistic distance clustering. Ph.D. thesis, State University of New Jersey (2007)
Google Scholar
Iyigun, C., Ben-Israel, A.: Probabilistic distance clustering adjusted for cluster size. Probab. Eng. Inform. Sci. 22, 68–125 (2008)
Article MathSciNet Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium, vol. 1, pp. 281–297 (1967)
MathSciNet MATH Google Scholar
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley Interscience, New York (2000)
Book Google Scholar
R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (2016)
Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971)
Article Google Scholar
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 2nd edn. Academic Press, New York (2003)
MATH Google Scholar
Tortora, C., McNicholas, P.D.: FPDclustering: PD-clustering and factor PD-clustering. R package version 1.1 (2016). https://cran.r-project.org/web/packages/FPDclustering/index.html
Tortora, C., Gettler-Summa, M., Marino, M., Palumbo, F.: Factor probabilistic distance clustering (FPDC): a new clustering method. Adv. Data Anal. Classif. 10, 441–464 (2016)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors are very grateful to the two anonymous referees for their detailed and helpful comments to finalize the manuscript.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, San José State University, San Jose, CA, USA
Christopher Rainey & Cristina Tortora
Department of Political Sciences, University of Naples Federico II, Napoli, Italy
Francesco Palumbo

Authors

Christopher Rainey
View author publications
You can also search for this author in PubMed Google Scholar
Cristina Tortora
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Palumbo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francesco Palumbo .

Editor information

Editors and Affiliations

Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milan, Italy
Francesca Greselin
Department of Statistical Sciences, Università Cattolica del Sacro Cuore, Milan, Italy
Laura Deldossi
Department of Economic and Social Sciences, Università Cattolica del Sacro Cuore, Piacenza, Italy
Luca Bagnato
Department of Statistical Sciences, Sapienza University of Rome, Rome, Italy
Maurizio Vichi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rainey, C., Tortora, C., Palumbo, F. (2019). A Parametric Version of Probabilistic Distance Clustering. In: Greselin, F., Deldossi, L., Bagnato, L., Vichi, M. (eds) Statistical Learning of Complex Data. CLADAG 2017. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-030-21140-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-21140-0_4
Published: 07 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21139-4
Online ISBN: 978-3-030-21140-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

A Parametric Version of Probabilistic Distance Clustering

Abstract

Access this chapter

Similar content being viewed by others

A Probabilistic Distance Clustering Algorithm Using Gaussian and Student-t Multivariate Density Distributions

FPDclustering: a comprehensive R package for probabilistic distance clustering based methods

A Simple Clustering Algorithm Based on Weighted Expected Distances

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Parametric Version of Probabilistic Distance Clustering

Abstract

Access this chapter

Similar content being viewed by others

A Probabilistic Distance Clustering Algorithm Using Gaussian and Student-t Multivariate Density Distributions

FPDclustering: a comprehensive R package for probabilistic distance clustering based methods

A Simple Clustering Algorithm Based on Weighted Expected Distances

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation