Soft Bootstrapping in Cluster Analysis and Its Comparison with Other Resampling Methods

Mucha, Hans-Joachim; Bartel, Hans-Georg

doi:10.1007/978-3-319-01595-8_11

Hans-Joachim Mucha²¹ &
Hans-Georg Bartel²²

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

5292 Accesses
4 Citations

Abstract

The bootstrap approach is resampling taken with replacement from the original data. Here we consider sampling from the empirical distribution of a given data set in order to investigate the stability of results of cluster analysis. Concretely, the original bootstrap technique can be formulated by choosing the following weights of observations: m _i = n, if the corresponding object i is drawn n times, and m _i = 0, otherwise. We call the weights of observations masses. In this paper, we present another bootstrap method, called soft bootstrapping, that consists of random change of the “bootstrap masses” to some degree. Soft bootstrapping can be applied to any cluster analysis method that makes (directly or indirectly) use of weights of observations. This resampling scheme is especially appropriate for small sample sizes because no object is totally excluded from the soft bootstrap sample. At the end we compare different resampling techniques with respect to cluster analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Efron, B. (1979). Bootstrap methods: Another look at the Jackknife. Annals of Statistics, 47, 1–26
Article MathSciNet Google Scholar
Efron, B., & Tibshrani, R. J. (1993). An Introduction to the bootstrap. New York: Chapman & Hall.
Book MATH Google Scholar
Haimerl, E., & Mucha, H.-J. (2007). Comparing the stability of different clustering results of dialect data. In R. Decker, & H.-J. Lenz (Eds.), Advances in data analysis (pp. 619–626). Berlin: Springer.
Chapter Google Scholar
Hartigan, J. A. (1969). Using subsample values as typical values. Journal of the American Statistical Association, 64, 1303–1317.
Article MathSciNet Google Scholar
Hennig, C. (2007). Cluster-wise assessment of cluster stability. Computational Statistics and Data Analysis, 52, 258–271.
Article MathSciNet MATH Google Scholar
Mammen, E. (1992). When does bootstrap work?: Asymptotic results and simulations. New York: Springer.
Book MATH Google Scholar
Mucha, H.-J. (2007). On validation of hierarchical clustering. In R. Decker, & H.-J. Lenz (Eds.), Advances in data analysis (pp. 115–122). Berlin: Springer.
Chapter Google Scholar
Mucha, H.-J. (2009). ClusCorr98 for Excel 2007: Clustering, multivariate visualization, and validation. In H.-J. Mucha, & G. Ritter (Eds.), Classification and clustering: Models, software and applications (pp. 14–40). WIAS, Berlin, Report 26.
Google Scholar
Mucha, H.-J., & Haimerl, E. (2005). Automatic validation of hierarchical cluster analysis with application in dialectometry. In C. Weihs, & W. Gaul (Eds.), Classification - the ubiquitous challenge (pp. 513–520). Berlin: Springer.
Chapter Google Scholar
Mucha, H.-J., Simon, U., & Brüggemann, R. (2002). Model-based cluster analysis applied to flow cytometry data of phytoplankton. Technical Report No. 5 (http://www.wias-berlin.de/), WIAS, Berlin.
Späth, H. (1982). Cluster analysis algorithms for data reduction and classification of objects. Chichester: Ellis Horwood.
Google Scholar
Späth, H. (1985). Cluster dissection and analysis. Chichester: Ellis Horwood.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Weierstrass Institute for Applied Analysis and Stochastics (WIAS), Mohrenstraße 39, 10117, Berlin, Germany
Hans-Joachim Mucha
Department of Chemistry, Humboldt University Berlin, Brook-Taylor-Straße 2, 12489, Berlin, Germany
Hans-Georg Bartel

Authors

Hans-Joachim Mucha
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Georg Bartel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hans-Joachim Mucha .

Editor information

Editors and Affiliations

Faculty of Computer Science, Otto-von-Guericke-Universität Magdeburg, Magdeburg, Germany
Myra Spiliopoulou
Institute of Computer Science, University of Hildesheim, Hildesheim, Germany
Lars Schmidt-Thieme
Institute of Computer Science, University of Hildesheim, Hildesheim, Germany
Ruth Janning

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mucha, HJ., Bartel, HG. (2014). Soft Bootstrapping in Cluster Analysis and Its Comparison with Other Resampling Methods. In: Spiliopoulou, M., Schmidt-Thieme, L., Janning, R. (eds) Data Analysis, Machine Learning and Knowledge Discovery. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-01595-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-01595-8_11
Published: 10 October 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01594-1
Online ISBN: 978-3-319-01595-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics