Skip to main content

Soft Bootstrapping in Cluster Analysis and Its Comparison with Other Resampling Methods

  • Conference paper
  • First Online:
Data Analysis, Machine Learning and Knowledge Discovery

Abstract

The bootstrap approach is resampling taken with replacement from the original data. Here we consider sampling from the empirical distribution of a given data set in order to investigate the stability of results of cluster analysis. Concretely, the original bootstrap technique can be formulated by choosing the following weights of observations: m i = n, if the corresponding object i is drawn n times, and m i = 0, otherwise. We call the weights of observations masses. In this paper, we present another bootstrap method, called soft bootstrapping, that consists of random change of the “bootstrap masses” to some degree. Soft bootstrapping can be applied to any cluster analysis method that makes (directly or indirectly) use of weights of observations. This resampling scheme is especially appropriate for small sample sizes because no object is totally excluded from the soft bootstrap sample. At the end we compare different resampling techniques with respect to cluster analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Efron, B. (1979). Bootstrap methods: Another look at the Jackknife. Annals of Statistics, 47, 1–26

    Article  MathSciNet  Google Scholar 

  • Efron, B., & Tibshrani, R. J. (1993). An Introduction to the bootstrap. New York: Chapman & Hall.

    Book  MATH  Google Scholar 

  • Haimerl, E., & Mucha, H.-J. (2007). Comparing the stability of different clustering results of dialect data. In R. Decker, & H.-J. Lenz (Eds.), Advances in data analysis (pp. 619–626). Berlin: Springer.

    Chapter  Google Scholar 

  • Hartigan, J. A. (1969). Using subsample values as typical values. Journal of the American Statistical Association, 64, 1303–1317.

    Article  MathSciNet  Google Scholar 

  • Hennig, C. (2007). Cluster-wise assessment of cluster stability. Computational Statistics and Data Analysis, 52, 258–271.

    Article  MathSciNet  MATH  Google Scholar 

  • Mammen, E. (1992). When does bootstrap work?: Asymptotic results and simulations. New York: Springer.

    Book  MATH  Google Scholar 

  • Mucha, H.-J. (2007). On validation of hierarchical clustering. In R. Decker, & H.-J. Lenz (Eds.), Advances in data analysis (pp. 115–122). Berlin: Springer.

    Chapter  Google Scholar 

  • Mucha, H.-J. (2009). ClusCorr98 for Excel 2007: Clustering, multivariate visualization, and validation. In H.-J. Mucha, & G. Ritter (Eds.), Classification and clustering: Models, software and applications (pp. 14–40). WIAS, Berlin, Report 26.

    Google Scholar 

  • Mucha, H.-J., & Haimerl, E. (2005). Automatic validation of hierarchical cluster analysis with application in dialectometry. In C. Weihs, & W. Gaul (Eds.), Classification - the ubiquitous challenge (pp. 513–520). Berlin: Springer.

    Chapter  Google Scholar 

  • Mucha, H.-J., Simon, U., & Brüggemann, R. (2002). Model-based cluster analysis applied to flow cytometry data of phytoplankton. Technical Report No. 5 (http://www.wias-berlin.de/), WIAS, Berlin.

  • Späth, H. (1982). Cluster analysis algorithms for data reduction and classification of objects. Chichester: Ellis Horwood.

    Google Scholar 

  • Späth, H. (1985). Cluster dissection and analysis. Chichester: Ellis Horwood.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hans-Joachim Mucha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Mucha, HJ., Bartel, HG. (2014). Soft Bootstrapping in Cluster Analysis and Its Comparison with Other Resampling Methods. In: Spiliopoulou, M., Schmidt-Thieme, L., Janning, R. (eds) Data Analysis, Machine Learning and Knowledge Discovery. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-01595-8_11

Download citation

Publish with us

Policies and ethics