Supervised clustering of variables

Chen, Mingkun; Vigneau, Evelyne

doi:10.1007/s11634-014-0191-5

Supervised clustering of variables

Regular Article
Published: 15 November 2014

Volume 10, pages 85–101, (2016)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Mingkun Chen¹ &
Evelyne Vigneau¹

512 Accesses
3 Citations
Explore all metrics

Abstract

In predictive modelling, highly correlated predictors lead to unstable models that are often difficult to interpret. The selection of features, or the use of latent components that reduce the complexity among correlated observed variables, are common strategies. Our objective with the new procedure that we advocate here is to achieve both purposes: to highlight the group structure among the variables and to identify the most relevant groups of variables for prediction. The proposed procedure is an iterative adaptation of a method developed for the clustering of variables around latent variables (CLV). Modification of the standard CLV algorithm leads to a supervised procedure, in the sense that the variable to be predicted plays an active role in the clustering. The latent variables associated with the groups of variables, selected for their “proximity” to the variable to be predicted and their “internal homogeneity”, are progressively added in a predictive model. The features of the methodology are illustrated based on a simulation study and a real-world application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Barnes RJ, Dhanoa MS, Lister SJ (1989) Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra. Appl Spectrosc 45:772–777
Article Google Scholar
Chun H, Keles S (2010) Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J R Stat Soc B 72(1):3–25
Article MathSciNet Google Scholar
Filzmoser P, Liebmann B, Varmuza K (2009) Repeated double cross validation. J Chemom 23:160–171
Article Google Scholar
Hastie T, Tibshirani R, Botstein D, Brown P (2001) Supervised harvesting of expression trees. Genom Biol 2(1):1–12
Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the lasso. J Comput Graph Stat 12:531–547
Article MathSciNet Google Scholar
Le Cao KA, Rossouw D, Robert-Grani C, Besse P (2008) Sparse PLS: variable selection when integrating omics data. Stat Appl Genet Mol Biol 7(1): Art No 35
Le Thi HA, Le HM, Nguyen VV, Dinh TP (2008) A DC programming approach for feature selection in support vector machines learning. Adv Data Anal Classif 2:259–278
Article MathSciNet MATH Google Scholar
Leardi R, Boggia R, Terrile M (1992) Genetic algorithms as a strategy for feature selection. J Chemom 6(5):267–281
Article Google Scholar
Naes T, Kowalski B (1989) Predicting sensory profiles from external instrumental measurements. Food Qual Prefer 1:135–147
Article Google Scholar
Park MY, Hastie T, Tibshirani R (2007) Averaged gene expressions for regression. Biostatistics 8(2):212–227
Subedi S, Punzo A, Ingrassia S, McNicholas PD (2013) Clustering and classification via cluster-weighted factor analysers. Adv Data Anal Classif 7(1):5–40
Article MathSciNet MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc B 58(1):267–288
MathSciNet MATH Google Scholar
Vichi M, Saporta G (2009) Clustering and disjoint principal component analysis. Comput Stat Data Anal 53:3194–3208
Article MathSciNet MATH Google Scholar
Vigneau E, Qannari E (2003) Clustering of variables around latent components. Commun Stat Simul Comput 32(4):1131–1150
Article MathSciNet MATH Google Scholar
Vigneau E, Thomas F (2012) Model calibration and feature selection for orange juice authentication by 1H NMR spectroscopy. Chemom Intell Lab 117:22–30
Article Google Scholar
Vigneau E, Sahmer K, Qannari EM, Bertrand D (2005) Clustering of variables to analyze spectral data. J Chemom 19(3):122–128
Vigneau E, Endrizzi I, Qannari E (2011) Finding and explaining clusters of consumers using the CLV approach. Food Qual Pref 22(4):705–713
Article Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Roy Stat Soc B 67(3):301–320
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15:265–286
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Sensometrics and Chemometrics Laboratory, LUNAM University, ONIRIS, Site de la Geraudiere, BP 82225, 44300, Nantes CEDEX 3, France
Mingkun Chen & Evelyne Vigneau

Authors

Mingkun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Evelyne Vigneau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Evelyne Vigneau.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, M., Vigneau, E. Supervised clustering of variables. Adv Data Anal Classif 10, 85–101 (2016). https://doi.org/10.1007/s11634-014-0191-5

Download citation

Received: 15 October 2013
Revised: 17 October 2014
Accepted: 27 October 2014
Published: 15 November 2014
Issue Date: March 2016
DOI: https://doi.org/10.1007/s11634-014-0191-5

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Supervised clustering of variables

Abstract

Access this article

Similar content being viewed by others

Modelling the role of variables in model-based cluster analysis

Cluster Validation for Mixtures of Regressions via the Total Sum of Squares Decomposition

Variable Selection in Cluster Analysis: An Approach Based on a New Index

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Supervised clustering of variables

Abstract

Access this article

Similar content being viewed by others

Modelling the role of variables in model-based cluster analysis

Cluster Validation for Mixtures of Regressions via the Total Sum of Squares Decomposition

Variable Selection in Cluster Analysis: An Approach Based on a New Index

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation