Skip to main content

Abstract

A new cluster algorithm based on the SAR procedure proposed by Peña and Tiao [9] is presented. The method splits the data into more homogeneous groups by putting together observations which have the same sensitivity to the deletion of extreme points in the sample. As the sample is always split by this method the second stage is to check if observations outside each group can be recombined one by one into the groups by using the distance implied by the model. The performance of this algorithm is compared to some well known cluster methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Atkinson A.C. (1994). Fast very robust methods for detection of multiple outliers. Journal of the American Statistical Association 89, 1329–1339.

    Article  MATH  Google Scholar 

  2. Box G.E.P., Tiao G.C. (1973). Bayesian inference in statistical analysis. Addison-Wesley.

    Google Scholar 

  3. Banfield J.D., Raftery A. (1993). Model-based Gaussian and nonGaussian clustering. Biometrics 49, 803–821.

    Article  MATH  MathSciNet  Google Scholar 

  4. Cuesta-Albertos, J. A., Gordaliza, A. C, Matrán, C. (1997). Trimmed k-means: an attempt to robustify quantizers. The Annals of Statistics 25, 553–576.

    Article  MATH  MathSciNet  Google Scholar 

  5. Cuevas A., Febrero, M., Fraiman R. (2000). Estimating the number of clusters. Canadian Journal of Statistics 28, 367–382.

    Article  MATH  MathSciNet  Google Scholar 

  6. Dasgupta A., Raftery A.E. (1998). Detecting features in spatial point processes with clutter via model-based clustering. Journal of the American Statistical Association 93, 294–302.

    Article  MATH  Google Scholar 

  7. Fraley C., Raftery A.E. (1999). MCL UST: Software for model-based cluster analysis. Journal of Classification 16, 297–306.

    Article  MATH  Google Scholar 

  8. Gordon A. (1999). Classification. 2nd edn. London: Chapman and HallCRC.

    MATH  Google Scholar 

  9. Peña D., and Tiao G.C. (2003). The SAR procedure: A diagnostic analysis of heterogeneous data. (Manuscript submitted for publication).

    Google Scholar 

  10. Peña D., Rodriguez J., Tiao G.C. (2004). Cluster analysis by the SAR procedure (Manuscript submitted for publication).

    Google Scholar 

  11. Peña, D. and Prieto, J. (2001). Cluster identification using projections. Journal of the American Statistical Association 96, 1433–1445.

    Article  MATH  MathSciNet  Google Scholar 

  12. Richarson S., Green P.J. (1997). On Bayesian analysis of mixtures with an unknown number of components. Journal of the Royal Statistical Society B 59, 731–758.

    Article  Google Scholar 

  13. Rousseeuw P.J., Leroy A.M. (1987). Robust regression and outlier detection. New York: John Wiley.

    Book  MATH  Google Scholar 

  14. Stephens M. (2000). Bayesian analysis of mixture models with an unknown number of components-an alternative to reversible jump methods. The Annals of Statistics 28, 40–74.

    Article  MATH  MathSciNet  Google Scholar 

  15. Stuyf A., Hubert M., Rousseeuw P.J. (1997). Integrating robust clustering techniques in S-PLUS. Computational Statistics and Data Analysis 26, 17–37.

    Article  Google Scholar 

  16. Tibshirani R., Walther G., Hastie T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society B 63, 411–423.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Peña, D., Rodríguez, J., Tiao, G.C. (2004). A General Partition Cluster Algorithm. In: Antoch, J. (eds) COMPSTAT 2004 — Proceedings in Computational Statistics. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-2656-2_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-7908-2656-2_30

  • Publisher Name: Physica, Heidelberg

  • Print ISBN: 978-3-7908-1554-2

  • Online ISBN: 978-3-7908-2656-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics