ℓ1-penalization for mixture regression models

Städler, Nicolas; Bühlmann, Peter; van de Geer, Sara

doi:10.1007/s11749-010-0197-z

ℓ₁-penalization for mixture regression models

Invited Paper
Published: 30 June 2010

Volume 19, pages 209–256, (2010)
Cite this article

TEST Aims and scope Submit manuscript

Nicolas Städler¹,
Peter Bühlmann¹ &
Sara van de Geer¹

1667 Accesses
156 Citations
4 Altmetric
Explore all metrics

Abstract

We consider a finite mixture of regressions (FMR) model for high-dimensional inhomogeneous data where the number of covariates may be much larger than sample size. We propose an ℓ ₁-penalized maximum likelihood estimator in an appropriate parameterization. This kind of estimation belongs to a class of problems where optimization and theory for non-convex functions is needed. This distinguishes itself very clearly from high-dimensional estimation with convex loss- or objective functions as, for example, with the Lasso in linear or generalized linear models. Mixture models represent a prime and important example where non-convexity arises.

For FMR models, we develop an efficient EM algorithm for numerical optimization with provable convergence properties. Our penalized estimator is numerically better posed (e.g., boundedness of the criterion function) than unpenalized maximum likelihood estimation, and it allows for effective statistical regularization including variable selection. We also present some asymptotic theory and oracle inequalities: due to non-convexity of the negative log-likelihood function, different mathematical arguments are needed than for problems with convex losses. Finally, we apply the new method to both simulated and real data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bertsekas D (1995) Nonlinear programming. Athena Scientific, Belmont
MATH Google Scholar
Bickel P, Ritov Y, Tsybakov A (2009) Simultaneous analysis of Lasso and Dantzig selector. Ann Stat 37:1705–1732
Article MATH MathSciNet Google Scholar
Bunea F, Tsybakov A, Wegkamp M (2007) Sparsity oracle inequalities for the Lasso. Electron J Stat 1:169–194
Article MATH MathSciNet Google Scholar
Cai T, Wang L, Xu G (2009a) Stable recovery of sparse signals and an oracle inequality. Tech rep, Department of Statistics, University of Pennsylvania
Cai T, Xu G, Zhang J (2009b) On recovery of sparse signals via ℓ ₁ minimization. IEEE Trans Inf Theory 55:3388–3397
Article MathSciNet Google Scholar
Candès E, Plan Y (2009) Near-ideal model selection by ℓ ₁ minimization. Ann Stat 37:2145–2177
Article MATH Google Scholar
Candès E, Tao T (2005) Decoding by linear programming. IEEE Trans Inf Theory 51:4203–4215
Article Google Scholar
Candès E, Tao T (2007) The Dantzig selector: statistical estimation when p is much larger than n (with discussion). Ann Stat 35:2313–2404
Article MATH Google Scholar
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc, Ser B 39:1–38
MATH MathSciNet Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Article MATH MathSciNet Google Scholar
Friedman J, Hastie T, Hoefling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1:302–332
Article MATH MathSciNet Google Scholar
Friedman J, Hastie T, Tibshirani R (2008) Regularized paths for generalized linear models via coordinate descent. Tech rep, Department of Statistics, Stanford University
Fu WJ (1998) Penalized regression: the Bridge versus the Lasso. J Comput Graph Stat 7:397–416
Article Google Scholar
Greenshtein E, Ritov Y (2004) Persistence in high-dimensional predictor selection and the virtue of over-parametrization. Bernoulli 10:971–988
Article MATH MathSciNet Google Scholar
Grün B, Leisch F (2007) Fitting finite mixtures of generalized linear regressions in R. Comput Stat Data Anal 51:5247–5252. doi:10.1016/j.csda.2006.08.014
Article MATH Google Scholar
Grün B, Leisch F (2008) FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J Stat Softw 28:1–35. http://www.jstatsoft.org/v28/i04/
Google Scholar
Huang J, Ma S, Zhang CH (2008) Adaptive Lasso for sparse high-dimensional regression models. Stat Sin 18:1603–1618
MATH MathSciNet Google Scholar
Khalili A, Chen J (2007) Variable selection in finite mixture of regression models. J Am Stat Assoc 102:1025–1038
Article MATH MathSciNet Google Scholar
Koltchinskii V (2009) The Dantzig selector and sparsity oracle inequalities. Bernoulli 15:799–828
Article MathSciNet Google Scholar
Lehmann E (1983) Theory of point estimation. Wadsworth and Brooks/Cole, Pacific Grove
MATH Google Scholar
Leisch F (2004) FlexMix: a general framework for finite mixture models and latent class regression in R. J Stat Softw 11:1–18. http://www.jstatsoft.org/v11/i08/
Google Scholar
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Book MATH Google Scholar
Meier L, van de Geer S, Bühlmann P (2008) The group Lasso for logistic regression. J R Stat Soc, Ser B 70:53–71
MATH Google Scholar
Meinshausen N, Bühlmann P (2006) High dimensional graphs and variable selection with the Lasso. Ann Stat 34:1436–1462
Article MATH Google Scholar
Meinshausen N, Yu B (2009) Lasso-type recovery of sparse representations for high-dimensional data. Ann Stat 37:246–270
Article MATH MathSciNet Google Scholar
Pan W, Shen X (2007) Penalized model-based clustering with application to variable selection. J Mach Learn Res 8:1145–1164
Google Scholar
Park T, Casella G (2008) The Bayesian Lasso. J Am Stat Assoc 103:681–686
Article MATH MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc, Ser B 58:267–288
MATH MathSciNet Google Scholar
Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109:475–494
Article MATH MathSciNet Google Scholar
Tseng P, Yun S (2008) A coordinate gradient descent method for nonsmooth separable minimization. Math Program, Ser B 117:387–423
Article MathSciNet Google Scholar
Tsybakov A (2004) Optimal aggregation of classifiers in statistical learning. Ann Stat 32:135–166
Article MATH MathSciNet Google Scholar
van de Geer S (2000) Empirical processes in M-estimation. University Press, Cambridge
Google Scholar
van de Geer S (2008) High-dimensional generalized linear models and the Lasso. Ann Stat 36:614–645
Article MATH Google Scholar
van de Geer S, Bühlmann P (2009) On the conditions used to prove oracle results for the Lasso. Electron J Stat 3:1360–1392
Article MathSciNet Google Scholar
van de Geer S, Zhou S, Bühlmann P (2010) Prediction and variable selection with the Adaptive Lasso. Arxiv preprint arXiv:1001.5176 [mathST]
van der Vaart A (2007) Asymptotic statistics. University Press, Cambridge
Google Scholar
van der Vaart A, Wellner J (1996) Weak convergence and empirical processes. Springer, Berlin
MATH Google Scholar
Wainwright M (2009) Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ ₁-constrained quadratic programming (Lasso). IEEE Trans Inf Theory 55:2183–2202
Article Google Scholar
Wu C (1983) On the convergence properties of the EM algorithm. Ann Stat 11:95–103
Article MATH Google Scholar
Zhang T (2009) Some sharp performance bounds for least squares regression with L1 regularization. Ann Stat 37:2109 –2144
Article MATH Google Scholar
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942
Article MATH Google Scholar
Zhang CH, Huang J (2008) The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann Stat 36:1567–1594
Article MATH Google Scholar
Zhao P, Yu B (2006) On model selection consistency of Lasso. J Mach Learn Res 7:2541–2563
MathSciNet Google Scholar
Zou H (2006) The adaptive Lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Seminar für Statistik, ETH Zürich, 8092, Zürich, Switzerland
Nicolas Städler, Peter Bühlmann & Sara van de Geer

Authors

Nicolas Städler
View author publications
You can also search for this author in PubMed Google Scholar
Peter Bühlmann
View author publications
You can also search for this author in PubMed Google Scholar
Sara van de Geer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Bühlmann.

Additional information

This invited paper is discussed in the comments available at: doi:10.1007/s11749-010-0198-y, doi:10.1007/s11749-010-0199-x, doi:10.1007/s11749-010-0200-8, doi:10.1007/s11749-010-0201-7, doi:10.1007/s11749-010-0202-6.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Städler, N., Bühlmann, P. & van de Geer, S. ℓ₁-penalization for mixture regression models. TEST 19, 209–256 (2010). https://doi.org/10.1007/s11749-010-0197-z

Download citation

Received: 03 June 2009
Accepted: 23 May 2010
Published: 30 June 2010
Issue Date: August 2010
DOI: https://doi.org/10.1007/s11749-010-0197-z

Keywords

Mathematics Subject Classification (2000)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ℓ₁-penalization for mixture regression models

Abstract

Access this article

Similar content being viewed by others

Semiparametric mixtures of nonparametric regressions

Penalized Estimation of a Finite Mixture of Linear Regression Models

Robust variable selection for finite mixture regression models

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Navigation

ℓ1-penalization for mixture regression models

Abstract

Access this article

Similar content being viewed by others

Semiparametric mixtures of nonparametric regressions

Penalized Estimation of a Finite Mixture of Linear Regression Models

Robust variable selection for finite mixture regression models

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation

ℓ₁-penalization for mixture regression models