Bottom-Up Variable Selection in Cluster Analysis Using Bootstrapping: A Proposal

Mucha, Hans-Joachim; Bartel, Hans-Georg

doi:10.1007/978-3-319-25226-1_11

Bottom-Up Variable Selection in Cluster Analysis Using Bootstrapping: A Proposal

Hans-Joachim Mucha²⁰ &
Hans-Georg Bartel²¹

Conference paper
First Online: 04 August 2016

2218 Accesses

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Abstract

Variable selection is a problem of increasing interest in many areas of multivariate statistics such as classification, clustering and regression. In contradiction to supervised classification, variable selection in cluster analysis is a much more difficult problem because usually nothing is known about the true class structure. In addition, in clustering, variable selection is highly related to the main problem of the determination of the number of clusters K to be inherent in the data. Here we present a very general bottom-up approach to variable selection in clustering starting with univariate investigations of stability. The hope is that the structure of interest may be contained in only a small subset of variables. Very general means, we make only use of non-parametric resampling techniques for purposes of validation, where we are looking for clusters that can be reproduced to a high degree under resampling schemes. So, our proposed technique can be applied to almost any cluster analysis method.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Carmone, F. J., Kara, A., & Maxwell, S. (1999). HINoV: A new model to improve market segment definition by identifying noisy variables. Journal of Marketing Research, 36, 501–509.
Article Google Scholar
Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2), 224–227.
Article Google Scholar
Flury, B., & Riedwyl, H. (1988). Multivariate statistics: A practical approach. London: Chapman and Hall.
Book MATH Google Scholar
Fowlkes, E. B., Gnanadesikan, R., & Kettenring, J. R. (1988). Variable selection in clustering. Journal of Classification, 5, 205–228.
Article MathSciNet Google Scholar
Gnanadesikan, R., Kettenring, J. R., & Tsao, S. L. (1995). Weighting and selection of variables for cluster analysis. Journal of Classification, 12, 113–136.
Article MATH Google Scholar
Hennig, C. (2007). Cluster-wise assessment of cluster stability. Computational Statistics and Data Analysis, 52, 258–271.
Article MathSciNet MATH Google Scholar
Hubert, L. J., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
Article MATH Google Scholar
Meinshausen, N., & Bühlmann, P. (2010). Stability selection. Journal of the Royal Statistical Society: Series B, 72(4), 417–473.
Article MathSciNet Google Scholar
Mucha, H.-J. (1996). ClusCorr: Cluster analysis and multivariate graphics under MS Excel. In H.-J. Mucha & H.-H. Bock (Eds.), Classification and clustering: Models, software and applications, Report 10 (pp. 97–106). Berlin: WIAS.
Google Scholar
Mucha, H.-J. (2009). ClusCorr98 for Excel 2007: Clustering, multivariate visualization, and validation. In H.-J. Mucha & G. Ritter (Eds.), Classification and clustering: Models, software and applications, Report 26 (pp. 14–40). Berlin: WIAS.
Google Scholar
Mucha, H.-J., & Bartel, H.-G. (2014). Soft bootstrapping in cluster analysis and its comparison with other resampling methods. In M. Spiliopoulou, L. Schmidt-Thieme, & R. Janning (Eds.), Data analysis, machine learning and knowledge discovery (pp. 97–104). Berlin: Springer.
Chapter Google Scholar
Mucha, H.-J., Bartel, H.-G., Dolata, J., & Morales-Merino, C. (2015). An introduction to clustering with applications to archaeometry. In J. A. Barcelo & I. Bogdanovic (Eds.), Mathematics and archaeology (Chap. 9). Boca Raton: CRC Press.
Google Scholar
Mucha, H.-J., & Ritter, G. (2009). Classification and clustering: Models, software and applications, Report 26 (pp. 114–125). Berlin: WIAS.
Google Scholar

Download references

Author information

Authors and Affiliations

Weierstrass Institute for Applied Analysis and Stochastics (WIAS), Mohrenstraße 39, 10117, Berlin, Germany
Hans-Joachim Mucha
Department of Chemistry, Humboldt University, Brook-Taylor-Straße 2, 12489, Berlin, Germany
Hans-Georg Bartel

Authors

Hans-Joachim Mucha
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Georg Bartel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hans-Joachim Mucha .

Editor information

Editors and Affiliations

Jacobs University Bremen , Bremen, Germany
Adalbert F.X. Wilhelm
Universität Ulm, Institute of Medical Systems Biology Universität Ulm, Ulm, Baden-Württemberg, Germany
Hans A. Kestler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mucha, HJ., Bartel, HG. (2016). Bottom-Up Variable Selection in Cluster Analysis Using Bootstrapping: A Proposal. In: Wilhelm, A., Kestler, H. (eds) Analysis of Large and Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-25226-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-25226-1_11
Published: 04 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25224-7
Online ISBN: 978-3-319-25226-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics