Skip to main content
Log in

1-penalization for mixture regression models

  • Invited Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

We consider a finite mixture of regressions (FMR) model for high-dimensional inhomogeneous data where the number of covariates may be much larger than sample size. We propose an 1-penalized maximum likelihood estimator in an appropriate parameterization. This kind of estimation belongs to a class of problems where optimization and theory for non-convex functions is needed. This distinguishes itself very clearly from high-dimensional estimation with convex loss- or objective functions as, for example, with the Lasso in linear or generalized linear models. Mixture models represent a prime and important example where non-convexity arises.

For FMR models, we develop an efficient EM algorithm for numerical optimization with provable convergence properties. Our penalized estimator is numerically better posed (e.g., boundedness of the criterion function) than unpenalized maximum likelihood estimation, and it allows for effective statistical regularization including variable selection. We also present some asymptotic theory and oracle inequalities: due to non-convexity of the negative log-likelihood function, different mathematical arguments are needed than for problems with convex losses. Finally, we apply the new method to both simulated and real data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bertsekas D (1995) Nonlinear programming. Athena Scientific, Belmont

    MATH  Google Scholar 

  • Bickel P, Ritov Y, Tsybakov A (2009) Simultaneous analysis of Lasso and Dantzig selector. Ann Stat 37:1705–1732

    Article  MATH  MathSciNet  Google Scholar 

  • Bunea F, Tsybakov A, Wegkamp M (2007) Sparsity oracle inequalities for the Lasso. Electron J Stat 1:169–194

    Article  MATH  MathSciNet  Google Scholar 

  • Cai T, Wang L, Xu G (2009a) Stable recovery of sparse signals and an oracle inequality. Tech rep, Department of Statistics, University of Pennsylvania

  • Cai T, Xu G, Zhang J (2009b) On recovery of sparse signals via 1 minimization. IEEE Trans Inf Theory 55:3388–3397

    Article  MathSciNet  Google Scholar 

  • Candès E, Plan Y (2009) Near-ideal model selection by 1 minimization. Ann Stat 37:2145–2177

    Article  MATH  Google Scholar 

  • Candès E, Tao T (2005) Decoding by linear programming. IEEE Trans Inf Theory 51:4203–4215

    Article  Google Scholar 

  • Candès E, Tao T (2007) The Dantzig selector: statistical estimation when p is much larger than n (with discussion). Ann Stat 35:2313–2404

    Article  MATH  Google Scholar 

  • Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc, Ser B 39:1–38

    MATH  MathSciNet  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    Article  MATH  MathSciNet  Google Scholar 

  • Friedman J, Hastie T, Hoefling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1:302–332

    Article  MATH  MathSciNet  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2008) Regularized paths for generalized linear models via coordinate descent. Tech rep, Department of Statistics, Stanford University

  • Fu WJ (1998) Penalized regression: the Bridge versus the Lasso. J Comput Graph Stat 7:397–416

    Article  Google Scholar 

  • Greenshtein E, Ritov Y (2004) Persistence in high-dimensional predictor selection and the virtue of over-parametrization. Bernoulli 10:971–988

    Article  MATH  MathSciNet  Google Scholar 

  • Grün B, Leisch F (2007) Fitting finite mixtures of generalized linear regressions in R. Comput Stat Data Anal 51:5247–5252. doi:10.1016/j.csda.2006.08.014

    Article  MATH  Google Scholar 

  • Grün B, Leisch F (2008) FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J Stat Softw 28:1–35. http://www.jstatsoft.org/v28/i04/

    Google Scholar 

  • Huang J, Ma S, Zhang CH (2008) Adaptive Lasso for sparse high-dimensional regression models. Stat Sin 18:1603–1618

    MATH  MathSciNet  Google Scholar 

  • Khalili A, Chen J (2007) Variable selection in finite mixture of regression models. J Am Stat Assoc 102:1025–1038

    Article  MATH  MathSciNet  Google Scholar 

  • Koltchinskii V (2009) The Dantzig selector and sparsity oracle inequalities. Bernoulli 15:799–828

    Article  MathSciNet  Google Scholar 

  • Lehmann E (1983) Theory of point estimation. Wadsworth and Brooks/Cole, Pacific Grove

    MATH  Google Scholar 

  • Leisch F (2004) FlexMix: a general framework for finite mixture models and latent class regression in R. J Stat Softw 11:1–18. http://www.jstatsoft.org/v11/i08/

    Google Scholar 

  • McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

    Book  MATH  Google Scholar 

  • Meier L, van de Geer S, Bühlmann P (2008) The group Lasso for logistic regression. J R Stat Soc, Ser B 70:53–71

    MATH  Google Scholar 

  • Meinshausen N, Bühlmann P (2006) High dimensional graphs and variable selection with the Lasso. Ann Stat 34:1436–1462

    Article  MATH  Google Scholar 

  • Meinshausen N, Yu B (2009) Lasso-type recovery of sparse representations for high-dimensional data. Ann Stat 37:246–270

    Article  MATH  MathSciNet  Google Scholar 

  • Pan W, Shen X (2007) Penalized model-based clustering with application to variable selection. J Mach Learn Res 8:1145–1164

    Google Scholar 

  • Park T, Casella G (2008) The Bayesian Lasso. J Am Stat Assoc 103:681–686

    Article  MATH  MathSciNet  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc, Ser B 58:267–288

    MATH  MathSciNet  Google Scholar 

  • Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109:475–494

    Article  MATH  MathSciNet  Google Scholar 

  • Tseng P, Yun S (2008) A coordinate gradient descent method for nonsmooth separable minimization. Math Program, Ser B 117:387–423

    Article  MathSciNet  Google Scholar 

  • Tsybakov A (2004) Optimal aggregation of classifiers in statistical learning. Ann Stat 32:135–166

    Article  MATH  MathSciNet  Google Scholar 

  • van de Geer S (2000) Empirical processes in M-estimation. University Press, Cambridge

    Google Scholar 

  • van de Geer S (2008) High-dimensional generalized linear models and the Lasso. Ann Stat 36:614–645

    Article  MATH  Google Scholar 

  • van de Geer S, Bühlmann P (2009) On the conditions used to prove oracle results for the Lasso. Electron J Stat 3:1360–1392

    Article  MathSciNet  Google Scholar 

  • van de Geer S, Zhou S, Bühlmann P (2010) Prediction and variable selection with the Adaptive Lasso. Arxiv preprint arXiv:1001.5176 [mathST]

  • van der Vaart A (2007) Asymptotic statistics. University Press, Cambridge

    Google Scholar 

  • van der Vaart A, Wellner J (1996) Weak convergence and empirical processes. Springer, Berlin

    MATH  Google Scholar 

  • Wainwright M (2009) Sharp thresholds for high-dimensional and noisy sparsity recovery using 1-constrained quadratic programming (Lasso). IEEE Trans Inf Theory 55:2183–2202

    Article  Google Scholar 

  • Wu C (1983) On the convergence properties of the EM algorithm. Ann Stat 11:95–103

    Article  MATH  Google Scholar 

  • Zhang T (2009) Some sharp performance bounds for least squares regression with L1 regularization. Ann Stat 37:2109 –2144

    Article  MATH  Google Scholar 

  • Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942

    Article  MATH  Google Scholar 

  • Zhang CH, Huang J (2008) The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann Stat 36:1567–1594

    Article  MATH  Google Scholar 

  • Zhao P, Yu B (2006) On model selection consistency of Lasso. J Mach Learn Res 7:2541–2563

    MathSciNet  Google Scholar 

  • Zou H (2006) The adaptive Lasso and its oracle properties. J Am Stat Assoc 101:1418–1429

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Bühlmann.

Additional information

This invited paper is discussed in the comments available at: doi:10.1007/s11749-010-0198-y, doi:10.1007/s11749-010-0199-x, doi:10.1007/s11749-010-0200-8, doi:10.1007/s11749-010-0201-7, doi:10.1007/s11749-010-0202-6.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Städler, N., Bühlmann, P. & van de Geer, S. 1-penalization for mixture regression models. TEST 19, 209–256 (2010). https://doi.org/10.1007/s11749-010-0197-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-010-0197-z

Keywords

Mathematics Subject Classification (2000)

Navigation