Advertisement

Soft Bootstrapping in Cluster Analysis and Its Comparison with Other Resampling Methods

  • Hans-Joachim Mucha
  • Hans-Georg Bartel
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

The bootstrap approach is resampling taken with replacement from the original data. Here we consider sampling from the empirical distribution of a given data set in order to investigate the stability of results of cluster analysis. Concretely, the original bootstrap technique can be formulated by choosing the following weights of observations: m i = n, if the corresponding object i is drawn n times, and m i = 0, otherwise. We call the weights of observations masses. In this paper, we present another bootstrap method, called soft bootstrapping, that consists of random change of the “bootstrap masses” to some degree. Soft bootstrapping can be applied to any cluster analysis method that makes (directly or indirectly) use of weights of observations. This resampling scheme is especially appropriate for small sample sizes because no object is totally excluded from the soft bootstrap sample. At the end we compare different resampling techniques with respect to cluster analysis.

References

  1. Efron, B. (1979). Bootstrap methods: Another look at the Jackknife. Annals of Statistics, 47, 1–26MathSciNetCrossRefGoogle Scholar
  2. Efron, B., & Tibshrani, R. J. (1993). An Introduction to the bootstrap. New York: Chapman & Hall.CrossRefMATHGoogle Scholar
  3. Haimerl, E., & Mucha, H.-J. (2007). Comparing the stability of different clustering results of dialect data. In R. Decker, & H.-J. Lenz (Eds.), Advances in data analysis (pp. 619–626). Berlin: Springer.CrossRefGoogle Scholar
  4. Hartigan, J. A. (1969). Using subsample values as typical values. Journal of the American Statistical Association, 64, 1303–1317.MathSciNetCrossRefGoogle Scholar
  5. Hennig, C. (2007). Cluster-wise assessment of cluster stability. Computational Statistics and Data Analysis, 52, 258–271.MathSciNetCrossRefMATHGoogle Scholar
  6. Mammen, E. (1992). When does bootstrap work?: Asymptotic results and simulations. New York: Springer.CrossRefMATHGoogle Scholar
  7. Mucha, H.-J. (2007). On validation of hierarchical clustering. In R. Decker, & H.-J. Lenz (Eds.), Advances in data analysis (pp. 115–122). Berlin: Springer.CrossRefGoogle Scholar
  8. Mucha, H.-J. (2009). ClusCorr98 for Excel 2007: Clustering, multivariate visualization, and validation. In H.-J. Mucha, & G. Ritter (Eds.), Classification and clustering: Models, software and applications (pp. 14–40). WIAS, Berlin, Report 26.Google Scholar
  9. Mucha, H.-J., & Haimerl, E. (2005). Automatic validation of hierarchical cluster analysis with application in dialectometry. In C. Weihs, & W. Gaul (Eds.), Classification - the ubiquitous challenge (pp. 513–520). Berlin: Springer.CrossRefGoogle Scholar
  10. Mucha, H.-J., Simon, U., & Brüggemann, R. (2002). Model-based cluster analysis applied to flow cytometry data of phytoplankton. Technical Report No. 5 (http://www.wias-berlin.de/), WIAS, Berlin.
  11. Späth, H. (1982). Cluster analysis algorithms for data reduction and classification of objects. Chichester: Ellis Horwood.Google Scholar
  12. Späth, H. (1985). Cluster dissection and analysis. Chichester: Ellis Horwood.MATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Weierstrass Institute for Applied Analysis and Stochastics (WIAS)BerlinGermany
  2. 2.Department of ChemistryHumboldt University BerlinBerlinGermany

Personalised recommendations