Abstract
A new cluster algorithm based on the SAR procedure proposed by Peña and Tiao [9] is presented. The method splits the data into more homogeneous groups by putting together observations which have the same sensitivity to the deletion of extreme points in the sample. As the sample is always split by this method the second stage is to check if observations outside each group can be recombined one by one into the groups by using the distance implied by the model. The performance of this algorithm is compared to some well known cluster methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Atkinson A.C. (1994). Fast very robust methods for detection of multiple outliers. Journal of the American Statistical Association 89, 1329–1339.
Box G.E.P., Tiao G.C. (1973). Bayesian inference in statistical analysis. Addison-Wesley.
Banfield J.D., Raftery A. (1993). Model-based Gaussian and nonGaussian clustering. Biometrics 49, 803–821.
Cuesta-Albertos, J. A., Gordaliza, A. C, Matrán, C. (1997). Trimmed k-means: an attempt to robustify quantizers. The Annals of Statistics 25, 553–576.
Cuevas A., Febrero, M., Fraiman R. (2000). Estimating the number of clusters. Canadian Journal of Statistics 28, 367–382.
Dasgupta A., Raftery A.E. (1998). Detecting features in spatial point processes with clutter via model-based clustering. Journal of the American Statistical Association 93, 294–302.
Fraley C., Raftery A.E. (1999). MCL UST: Software for model-based cluster analysis. Journal of Classification 16, 297–306.
Gordon A. (1999). Classification. 2nd edn. London: Chapman and HallCRC.
Peña D., and Tiao G.C. (2003). The SAR procedure: A diagnostic analysis of heterogeneous data. (Manuscript submitted for publication).
Peña D., Rodriguez J., Tiao G.C. (2004). Cluster analysis by the SAR procedure (Manuscript submitted for publication).
Peña, D. and Prieto, J. (2001). Cluster identification using projections. Journal of the American Statistical Association 96, 1433–1445.
Richarson S., Green P.J. (1997). On Bayesian analysis of mixtures with an unknown number of components. Journal of the Royal Statistical Society B 59, 731–758.
Rousseeuw P.J., Leroy A.M. (1987). Robust regression and outlier detection. New York: John Wiley.
Stephens M. (2000). Bayesian analysis of mixture models with an unknown number of components-an alternative to reversible jump methods. The Annals of Statistics 28, 40–74.
Stuyf A., Hubert M., Rousseeuw P.J. (1997). Integrating robust clustering techniques in S-PLUS. Computational Statistics and Data Analysis 26, 17–37.
Tibshirani R., Walther G., Hastie T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society B 63, 411–423.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Peña, D., Rodríguez, J., Tiao, G.C. (2004). A General Partition Cluster Algorithm. In: Antoch, J. (eds) COMPSTAT 2004 — Proceedings in Computational Statistics. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-2656-2_30
Download citation
DOI: https://doi.org/10.1007/978-3-7908-2656-2_30
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-7908-1554-2
Online ISBN: 978-3-7908-2656-2
eBook Packages: Springer Book Archive