Variable selection in model-based clustering and discriminant analysis with a regularization approach

Celeux, Gilles; Maugis-Rabusseau, Cathy; Sedki, Mohammed

doi:10.1007/s11634-018-0322-5

Variable selection in model-based clustering and discriminant analysis with a regularization approach

Regular Article
Published: 11 April 2018

Volume 13, pages 259–278, (2019)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

720 Accesses
13 Citations
Explore all metrics

Abstract

Several methods for variable selection have been proposed in model-based clustering and classification. These make use of backward or forward procedures to define the roles of the variables. Unfortunately, such stepwise procedures are slow and the resulting algorithms inefficient when analyzing large data sets with many variables. In this paper, we propose an alternative regularization approach for variable selection in model-based clustering and classification. In our approach the variables are first ranked using a lasso-like procedure in order to avoid slow stepwise algorithms. Thus, the variable selection methodology of Maugis et al. (Comput Stat Data Anal 53:3872–3882, 2000b) can be efficiently applied to high-dimensional data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Variable selection in discriminant analysis for mixed continuous-binary variables and several groups

Article 21 September 2018

Variable Selection in Cluster Analysis: An Approach Based on a New Index

Regularization and Model Selection with Categorical Covariates

Notes

SelvarClustIndep is implemented in C$++$ and is available at http://www.math.univ-toulouse.fr/~maugis/.
SelvarMixR package is available at https://CRAN.R-project.org/package=SelvarMix.
clustvarselR package is available at https://CRAN.R-project.org/package=clustvarsel.

References

Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803–821
Article MathSciNet MATH Google Scholar
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725
Article Google Scholar
Bouveyron C, Brunet C (2014) Discriminative variable selection for clustering with the sparse Fisher-EM algorithm. Comput Stat 29:489–513
Article MathSciNet MATH Google Scholar
Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28(5):781–793
Article Google Scholar
Celeux G, Maugis C, Martin-Magniette ML, Raftery AE (2014) Comparing model selection and regularization approaches to variable selection in model-based clustering. J Fr Stat Soc 155:57–71
MathSciNet MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J Roy Stat Soc B 39(1):1–38
MATH Google Scholar
Fraiman R, Justel A, Svarc M (2008) Selection of variables for cluster analysis and classification rules. J Am Stat Assoc 103:1294–1303
Article MathSciNet MATH Google Scholar
Friedman J, Hastie T, Tibshirani R (2007) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3):432–441
Article MATH Google Scholar
Friedman J, Hastie T, Tibshirani R (2014) glasso: graphical lasso—estimation of Gaussian graphical models. https://CRAN.R-project.org/package=glasso. Accessed 22 July 2014
Gagnot S, Tamby JP, Martin-Magniette ML, Bitton F, Taconnat L, Balzergue S, Aubourg S, Renou JP, Lecharny A, Brunaud V (2008) CATdb: a public access to arabidopsis transcriptome data from the URGV-CATMA platform. Nucleic Acids Res 36(suppl 1):D986–D990
Google Scholar
Galimberti G, Montanari A, Viroli C (2009) Penalized factor mixture analysis for variable selection in clustered data. Comput Stat Data Anal 53:4301–4310
Article MathSciNet MATH Google Scholar
Kim S, Song DKH, DeSarbo WS (2012) Model-based segmentation featuring simultaneous segment-level variable selection. J Mark Res 49:725–736
Article Google Scholar
Law MH, Figueiredo MAT, Jain AK (2004) Simultaneous feature selection and clustering using mixture models. IEEE Trans Pattern Anal Mach Intell 26(9):1154–1166
Article Google Scholar
Lebret R, Iovleff S, Langrognet F, Biernacki C, Celeux G, Govaert G (2015) Rmixmod: the R package of the model-based unsupervised, supervised and semi-supervised classification mixmod library. J Stat Softw 67(6):241–270
Article Google Scholar
Lee H, Li J (2012) Variable selection for clustering by separability based on ridgelines. J Comput Graph Stat 21:315–337
Article MathSciNet Google Scholar
Maugis C, Celeux G, Martin-Magniette M (2009a) Variable selection for clustering with Gaussian mixture models. Biometrics 65(3):701–709
Article MathSciNet MATH Google Scholar
Maugis C, Celeux G, Martin-Magniette ML (2009b) Variable selection in model-based clustering: a general variable role modeling. Comput Stat Data Anal 53:3872–3882
Article MathSciNet MATH Google Scholar
Maugis C, Celeux G, Martin-Magniette ML (2011) Variable selection in model-based discriminant analysis. J Multivar Anal 102:1374–1387
Article MathSciNet MATH Google Scholar
Meinshausen N, Bühlmann P (2006) High-dimensional graphs and variable selection with the Lasso. Ann Stat 34(3):1436–1462
Article MathSciNet MATH Google Scholar
Murphy TB, Dean N, Raftery AE (2010) Variable selection and updating in model-based discriminant analysis for high-dimensional data with food authenticity applications. Ann Appl Stat 4:396–421
Article MathSciNet MATH Google Scholar
Nia VP, Davison AC (2012) High-dimensional Bayesian clustering with variable selection: the R package bclust. J Stat Softw 47(5):1–22
Article Google Scholar
Pan W, Shen X (2007) Penalized model-based clustering with application to variable selection. J Mach Learn Res 8:1145–1164
MATH Google Scholar
Raftery AE, Dean N (2006) Variable selection for model-based clustering. J Am Stat Assoc 101(473):168–178
Article MathSciNet MATH Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Article MathSciNet MATH Google Scholar
Scrucca L, Raftery AE (2014) clustvarsel: a package implementing variable selection for model-based clustering in R. arXiv:1411.0606
Scrucca L, Fop M, Murphy TB, Raftery AE (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J 8(1):289
Article Google Scholar
Sun W, Wang J, Fang Y (2012) Regularized k-means clustering of high dimensional data and its asymptotic consistency. Electron J Stat 6:148–167
Article MathSciNet MATH Google Scholar
Tadesse MG, Sha N, Vannucci M (2005) Bayesian variable selection in clustering high-dimensional data. J Am Stat Assoc 100(470):602–617
Article MathSciNet MATH Google Scholar
Wang S, Zhu J (2008) Variable selection for model-based high-dimensional clustering and its application to microarray data. Biometrics 64(2):440–448
Article MathSciNet MATH Google Scholar
Xie B, Pan W, Shen X (2008) Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables. Electron J Stat 2:168–212
Article MathSciNet Google Scholar
Zhou H, Pan W, Shen X (2009) Penalized model-based clustering with unconstrained covariance matrices. Electron J Stat 3:1473–1496
Article MathSciNet MATH Google Scholar

Download references

Funding

Funding was provide by Paris- Saclay-DIGITEO and ANR (Grant No. ANR-13-JS01-0001-01).

Author information

Authors and Affiliations

Dept. de mathématiques, Inria and Université Paris-Sud, Btiment 425, 91405, Orsay Cedex, France
Gilles Celeux
Institut de Mathématiques de Toulouse, UMR 5219, Université de Toulouse, INSA de Toulouse, 135 avenue de Rangueil, 31077, Toulouse Cedex 4, France
Cathy Maugis-Rabusseau
Paris-Sud University and INSERM U1181, Bâtiment. 15/16, Hôpital Paul Brousse, 16 avenue Paul Vaillant Couturier, 94807, Villejuif Cedex, France
Mohammed Sedki

Authors

Gilles Celeux
View author publications
You can also search for this author in PubMed Google Scholar
Cathy Maugis-Rabusseau
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Sedki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed Sedki.

Procedures to maximize penalized empirical contrasts

1.1 The model-based clustering case

The EM algorithm for maximizing criterion (2) is as follows (Zhou et al. 2009). The penalized complete loglikelihood of the centered data set $\bar{\mathbf {y}} = \big (\bar{\mathbf {y}}_1, \ldots , \bar{\mathbf {y}}_n \big )'$ is given by

$$\begin{aligned} L_{\text {c},(\lambda , \rho )} (\bar{\mathbf {y}}, \mathbf {z}, \alpha )= & {} \sum _{i =1}^n \sum _{k = 1}^K z_{ik} \big [ \ln (\pi _k) + \ln \phi (\bar{\mathbf {y}}_i \mid \mu _k, \varSigma _k)\big ] - \lambda \sum ^K_{k = 1} \left\| \mu _k\right\| _1 \nonumber \\&- \rho \sum _{k = 1}^K \left\| \varTheta _k\right\| _1 , \end{aligned}$$

(5)

where $\varTheta _k=\varSigma _k^{-1}$ denotes the precision matrix of the kth mixture component. The EM algorithm of Zhou et al. (2009) maximizes at each iteration the conditional expectation of (5) given $\bar{\mathbf {y}}$ and a current parameter vector $\alpha ^{(s)}$: ${\mathbb {E}}\Big [L_{\text {c},(\lambda , \rho )}\big (\bar{\mathbf {y}}, \mathbf {z}, \alpha \big ) \mid \bar{\mathbf {y}}, \alpha ^{(s)}\Big ].$ The following two steps are repeated from an initial $\alpha ^{(0)}$ until convergence. At the sth iteration of the EM algorithm:

E-step: The conditional probabilities $t^{(s)}_{ik}$ that the ith observation belongs to the kth cluster are computed for $i=1,\ldots ,n$ and $k=1,\ldots ,K$,
$$\begin{aligned} t^{(s)}_{ik} = \mathbb {P}\big (z_{ik} = 1 \mid \bar{\mathbf {y}}, \alpha ^{(s)}\big ) = \frac{\pi _k^{(s)} \phi \Big ( \bar{\mathbf {y}}_i \mid \mu ^{(s)}_k,\varSigma ^{(s)}_k \Big )}{ \sum _{k' = 1}^K \pi _{k'}^{(s)} \phi \Big (\bar{\mathbf {y}}_i \mid \mu ^{(s)}_{k'}, \varSigma ^{(s)}_{k'} \Big )}. \end{aligned}$$
M-step : This step consists of maximizing the expected complete log-likelihood derived from the E-step. It leads to the following mixture parameter updates:
- The updated proportions are $\pi ^{(s+1)}_k = \frac{1}{n} \sum _{i = 1}^n t^{(s)}_{ik}$ for $k=1,\ldots ,K$.
- Compute the updated means $\mu ^{(s+1)}_1, \ldots , \mu ^{(s+1)}_K$ using formulas (14) et (15) of Zhou et al. (2009): the jth coordinate of $\mu ^{(s+1)}_k$ is the solution of the following equations:
  $$\begin{aligned} \text {if} \quad \left| \sum _{i = 1}^n t^{(s)}_{ik} \left[ \underset{v \ne j}{\sum _{v = 1}^p} \left( \bar{\mathbf {y}}_{ij} - \mu ^{(s)}_{k v}\right) \varTheta ^{(s)}_{k,v j} + \bar{\mathbf {y}}_{ij} \varTheta ^{(s)}_{k,jj}\right] \right| \le \lambda , \quad \text {then} \quad \mu ^{(s+1)}_{kj} = 0, \end{aligned}$$
  otherwise:
  $$\begin{aligned}&\left[ \sum _{i = 1}^n t^{(s)}_{i k}\right] \mu ^{(s+1)}_{kj} \varTheta ^{(s)}_{k,jj} + \ \lambda \ \text {sign}\left( \mu ^{(s+1)}_{kj}\right) \\&\quad = \sum _{i = 1}^n t_{ik}^{(s)} \sum _{v = 1}^p \bar{y}_{iv} \varTheta ^{(s)}_{k,v j} \\&\qquad - \left[ \sum _{i = 1}^n t^{(s)}_{i k}\right] \left[ \left( \sum _{v = 1}^p \mu ^{(s)}_{kv} \varTheta ^{(s)}_{k,vj}\right) - \mu ^{(s)}_{kj} \varTheta ^{(s)}_{k, jj}\right] . \end{aligned}$$
- For all $k=1,\ldots ,K$, the covariance matrix $\varSigma _k^{(s+1)}$ is obtained via the precision matrix $\varTheta _k^{(s+1)}$. The glasso algorithm (available in the R package glasso of Friedman et al. 2014) is used to solve the following minimization problem on the set of symmetric positive definite matrices (denoted $\varTheta \succ 0$):
  $$\begin{aligned} \underset{\varTheta \succ 0}{\mathop {\mathrm {arg\,min}}}\left\{ -\ln \text {det}\left( \varTheta \right) + \text {trace}\left( S_k^{(s+1)} \varTheta \right) + \rho _k^{(s+1)} \Vert \varTheta \Vert _1\right\} , \end{aligned}$$
  where $ \rho _k^{(s+1)} = 2 \rho \left( \sum _{i = 1}^n t^{(s)}_{ik}\right) ^{-1}$ and
  $$\begin{aligned} S^{(s+1)}_k = \frac{\sum _{i = 1}^n t^{(s)}_{ik}(\bar{\mathbf {y}}_i - \mu ^{(s+1)}_k) (\bar{\mathbf {y}}_i - \mu ^{(s+1)}_k)^\top }{\sum _{i = 1}^n t^{(s)}_{ik}}. \end{aligned}$$

1.2 The classification case

The maximization of the regularized criterion (4) at $\mu _1, \ldots , \mu _K$ and $\varTheta _1, \ldots , \varTheta _K$ is achieved using an algorithm similar to the one presented in Sect. A.1 when the labels $z_i$ are known.

The jth coordinate of the mean vector $\mu _k$ is the solution of the following equations:

$$\begin{aligned} \text {if} \quad \left| \sum _{i = 1}^n \mathbb {1}_{\{z_i = k\}} \left[ \underset{v \ne j}{\sum _{v = 1}^p} \left( \bar{\mathbf {y}}_{ij} - \mu _{k v}\right) \varTheta _{k,v j} + \bar{\mathbf {y}}_{ij} \varTheta _{k,jj}\right] \right| \le \lambda , \quad \text {then} \quad \mu _{kj} = 0, \end{aligned}$$

otherwise:

$$\begin{aligned}&\left[ \sum _{i = 1}^n \mathbb {1}_{\{z_i = k\}} \right] \mu _{kj} \varTheta _{k,jj} + \lambda \ \text {sign}\left( \mu _{kj}\right) \\&\quad = \sum _{i = 1}^n \mathbb {1}_{\{z_i = k\}} \sum _{v = 1}^p \bar{y}_{iv} \varTheta _{k,v j} \\&\qquad - \left[ \sum _{i = 1}^n \mathbb {1}_{\{z_i = k\}} \right] \left[ \left( \sum _{v = 1}^p \mu _{kv} \varTheta _{k,v j}\right) - \mu _{kj} \varTheta _{k, jj}\right] . \end{aligned}$$

To estimate the sparse precision matrices $\varTheta _1, \ldots , \varTheta _K$ from the data set $\mathbf {y}$ and the labels $\mathbf {z}$, we use the glasso algorithm to solve the following minimization problem on the set of symmetric positive definite matrices

$$\begin{aligned} \widehat{\varTheta }_k = \underset{\varTheta \succ 0}{\mathop {\mathrm {arg\,min}}}\Big \{ -\ln \text {det}\big (\varTheta \big ) + \text {trace}\big (S_k \varTheta \big ) + \rho _k \Vert \varTheta \Vert _1\Big \}, \end{aligned}$$

(6)

for each $k = 1, \ldots , K$. The $\ell _1$ regularization parameter in (6) is given by $\rho _k = 2 \rho \left( \sum _{i = 1}^n \mathbb {1}_{\{z_{i} = k\}}\right) ^{-1}$ and the empirical covariance matrix $S_k$ by

$$\begin{aligned} S_k = \frac{\sum _{i = 1}^n \mathbb {1}_{\{z_i=k\}} (\bar{\mathbf {y}}_i -\mu _k) (\bar{\mathbf {y}}_i - \mu _k)^\top }{ \sum _{i =1}^n\mathbb {1}_{\{z_i=k\}} }. \end{aligned}$$

Then, coordinate descent maximization in $(\mu _1, \ldots , \mu _K)$ and $(\varTheta _1, \ldots , \varTheta _K)$ is run until convergence.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Celeux, G., Maugis-Rabusseau, C. & Sedki, M. Variable selection in model-based clustering and discriminant analysis with a regularization approach. Adv Data Anal Classif 13, 259–278 (2019). https://doi.org/10.1007/s11634-018-0322-5

Download citation

Received: 14 February 2017
Revised: 01 March 2018
Accepted: 29 March 2018
Published: 11 April 2018
Issue Date: 08 March 2019
DOI: https://doi.org/10.1007/s11634-018-0322-5

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variable selection in model-based clustering and discriminant analysis with a regularization approach

Abstract

Access this article

Similar content being viewed by others

Variable selection in discriminant analysis for mixed continuous-binary variables and several groups

Variable Selection in Cluster Analysis: An Approach Based on a New Index

Regularization and Model Selection with Categorical Covariates

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Procedures to maximize penalized empirical contrasts

1.1 The model-based clustering case

1.2 The classification case

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Variable selection in model-based clustering and discriminant analysis with a regularization approach

Abstract

Access this article

Similar content being viewed by others

Variable selection in discriminant analysis for mixed continuous-binary variables and several groups

Variable Selection in Cluster Analysis: An Approach Based on a New Index

Regularization and Model Selection with Categorical Covariates

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Procedures to maximize penalized empirical contrasts

Procedures to maximize penalized empirical contrasts

1.1 The model-based clustering case

1.2 The classification case

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation