Robustness and Stability Analysis of Factor PD-Clustering on Large Social Data Sets

Tortora, Cristina; Marino, Marina

doi:10.1007/978-3-319-06692-9_29

Robustness and Stability Analysis of Factor PD-Clustering on Large Social Data Sets

Cristina Tortora²² &
Marina Marino²³

Conference paper
First Online: 01 January 2014

2257 Accesses
1 Citations

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Abstract

Factor clustering methods were proposed to cluster large data sets. Among them factor probabilistic distance clustering (FPDC) shows interesting performance. The method is based on two main steps: a Tucker3 decomposition of the distance array and probabilistic distance (PD) clustering on the resulting factors. The aim of this paper is to apply FPDC on behavioral and social data sets of large dimensions, to obtain homogeneous and well-separated clusters of individuals. The scope is to evaluate the stability and the robustness of the method dealing with these data sets. Stability of results is referred to the invariance of results in each iteration of the method. Robustness is referred to the sensitivity of the method to errors in data. These characteristics of the method are evaluated using bootstrap resampling.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Arabie, P., & Hubert, L. (1994). Cluster analysis in marketing research. In R. P. Bagozzi (Ed.), Advanced methods in marketing research (pp 160–189). Oxford: Blackwell.
Google Scholar
Ben-Hur, A., Elisseeff, A., & Guyon, I. (2002). A stability based method for discovering structure in clustered data. Pacific Symposium on Biocomputing, 7(6), 6–17.
Google Scholar
Ben-Israel, A., & Iyigun, C. (2008). Probabilistic d-clustering. Journal of Classification, 25(1), 5–26.
Article MATH MathSciNet Google Scholar
Bezdek, J. C. (1974). Numerical taxonomy with fuzzy sets. Journal of Mathematical Biology, 1(1), 57–71.
Article MATH MathSciNet Google Scholar
Bryan, J. (2004). Problems in gene clustering based on gene expression data. Journal of Multivariate Analyis, 90, 67–89.
Article MathSciNet Google Scholar
Bubeck, S., Meilă, M., & Von Luxburg, U. (2012). How the initialization affects the stability of the k-means algorithm. Probability and statistics: PS, 16, 436–452.
Article MATH Google Scholar
De Soete, G., & Carroll, J. D. (1994). k-means clustering in a low-dimensional euclidean space. In E. Diday, Y. Lechevallier, M. Schader, et al. (Eds.), New approaches in classification and data analysis. (pp. 212–219). Heidelberg: Springer.
Google Scholar
Devé, R. N., & Krishnapuram, R. (1997). Robust clustering methods: A unified view. IEEE Transiction on Fuzzy Systems, 5(2), 270–293.
Article Google Scholar
Dudoit, S., & Fridlyand, J. (2002). A prediction-based resampling method to estimate the number of clusters in a dataset. Genome Biology, 3, 0036.1–0036.21.
Article Google Scholar
Gettler Summa, M., Palumbo, F., & Tortora, C. (2011). Factor pd-clustering. Working paper [arXiv:1106.3830v3]
Google Scholar
Ghahramani, Z., & Hinton, G. E. (1997). The em algorithm for mixtures of factor analyzers. Crg-tr-96-1, University of Toronto, Toronto.
Google Scholar
Grün, B., & Leisch, F. (2004). Bootstrapping finite mixture models. Compstat 2004, proceedings in Computational Statistics, 1115–1122.
Google Scholar
Hennig, C. (2007). Cluster-wise assessment of cluster stability. Computational Statistics & Data Analysis, 52(1), 258–271.
Article MATH MathSciNet Google Scholar
Huber, P. J. (1981). Robust Statistics. New York: Wiley.
Book MATH Google Scholar
Iyigun, C. (2007). Probabilistic Distance Clustering. Ph.D. thesis, New Brunswick Rutgers, The State University of New Jersey.
Google Scholar
Kiers, H. A. L., & Kinderen, A. (2003). A fast method for choosing the numbers of components in tucker3 analysis. British Journal of Mathematical and Statistical Psychology, 56(1), 119–125.
Article MathSciNet Google Scholar
Kroonenberg, P. M. (2008). Applied multiway data analysis. Hoboken: Ebooks Corporation.
Book MATH Google Scholar
Lange, T., Roth, V., Braun, M. L., & Buhmann, J. M. (2004). Stability-based validation of clustering solutions. Neural Computation, 16(6), 1299–1323.
Article MATH Google Scholar
Lebart, A., Morineau, A., & Warwick, K. (1984). Multivariate statistical descriptive analysis. New York: Wiley.
MATH Google Scholar
Maronna, R. A., & Zamar, R. H. (2002). Robust estimates of location and dispersion for high-dimensional datasets. Technometrics, 44(4), 307–317.
Article MathSciNet Google Scholar
McLachlan, G. J., & Peel, D. (2003). Modelling high-dimensional data by mixtures of factor analyzers, Computational Statistics and Data Analysis, 41(3), 379–388.
Article MATH MathSciNet Google Scholar
Monti, S., Tamayo, P., Mesirov, J., & Golub, T. (2001). Consensus clustering: A resampling-based method for class discovery and visualization of gene. Expression Microarray Data, Machine Learning, 52, 91–118.
Article Google Scholar
Rocci, R., Gattone, S. A., & Vichi, M. (2011). A new dimension reduction method: Factor discriminant k-means. Journal of Classification, 28(2), 210–226.
Article MATH MathSciNet Google Scholar
Tibshirani, R., & Walther, G. (2005). Cluster validation by prediction strength. Journal of Computational and Graphical Statistics, 14, 511–528.
Article MathSciNet Google Scholar
Timmerman, M. E., Ceulemans, E., Kiers, H. A. L., & Vichi, M. (2010). Factorial and reduced k-means reconsidered. Computational Statistics & Data Analysis, 54(7), 1858–1871.
Article MATH MathSciNet Google Scholar
Tortora, C., Gettler Summa, M., & Palumbo, F. (2013). Factor pd-clustering. In U. Alfred, L. Berthold, & V. Dirk (Eds.), Algorithms from and for nature and life (volume, in press).
Google Scholar
Vendramin, L., Campello, R., & Hruschka, E. (2009). In SDM. On the comparison of relative clustering validity criteria (pp. 733–744).
Google Scholar
Vichi, M., & Kiers, H. A. L. (2001). Factorial k-means analysis for two way data. Computational Statistics and Data Analysis, 37, 29–64.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

University of Guelph, 50 Stone Road East, Guelph, ON, Canada
Cristina Tortora
Università degli Studi di Napoli Federico II, via Cinthia 40, Napoli, Italy
Marina Marino

Authors

Cristina Tortora
View author publications
You can also search for this author in PubMed Google Scholar
Marina Marino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cristina Tortora .

Editor information

Editors and Affiliations

Department of Statistical Science, University of Rome "La Sapienza", Rome, Italy
Donatella Vicari
and Information Sciences, Tama University Graduate School of Management, Tokyo, Japan
Akinori Okada
Department of Political Science, University of Naples "Federico II", Naples, Italy
Giancarlo Ragozini
Fakultät Statistik, Technische Universität Dortmund, Dortmund, Germany
Claus Weihs

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tortora, C., Marino, M. (2014). Robustness and Stability Analysis of Factor PD-Clustering on Large Social Data Sets. In: Vicari, D., Okada, A., Ragozini, G., Weihs, C. (eds) Analysis and Modeling of Complex Data in Behavioral and Social Sciences. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-06692-9_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-06692-9_29
Published: 17 June 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06691-2
Online ISBN: 978-3-319-06692-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics