, Volume 19, Issue 2, pp 209–256 | Cite as

1-penalization for mixture regression models

  • Nicolas Städler
  • Peter BühlmannEmail author
  • Sara van de Geer
Invited Paper


We consider a finite mixture of regressions (FMR) model for high-dimensional inhomogeneous data where the number of covariates may be much larger than sample size. We propose an 1-penalized maximum likelihood estimator in an appropriate parameterization. This kind of estimation belongs to a class of problems where optimization and theory for non-convex functions is needed. This distinguishes itself very clearly from high-dimensional estimation with convex loss- or objective functions as, for example, with the Lasso in linear or generalized linear models. Mixture models represent a prime and important example where non-convexity arises.

For FMR models, we develop an efficient EM algorithm for numerical optimization with provable convergence properties. Our penalized estimator is numerically better posed (e.g., boundedness of the criterion function) than unpenalized maximum likelihood estimation, and it allows for effective statistical regularization including variable selection. We also present some asymptotic theory and oracle inequalities: due to non-convexity of the negative log-likelihood function, different mathematical arguments are needed than for problems with convex losses. Finally, we apply the new method to both simulated and real data.


Adaptive Lasso Finite mixture models Generalized EM algorithm High-dimensional estimation Lasso Oracle inequality 

Mathematics Subject Classification (2000)

62J07 62F12 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bertsekas D (1995) Nonlinear programming. Athena Scientific, Belmont zbMATHGoogle Scholar
  2. Bickel P, Ritov Y, Tsybakov A (2009) Simultaneous analysis of Lasso and Dantzig selector. Ann Stat 37:1705–1732 zbMATHCrossRefMathSciNetGoogle Scholar
  3. Bunea F, Tsybakov A, Wegkamp M (2007) Sparsity oracle inequalities for the Lasso. Electron J Stat 1:169–194 zbMATHCrossRefMathSciNetGoogle Scholar
  4. Cai T, Wang L, Xu G (2009a) Stable recovery of sparse signals and an oracle inequality. Tech rep, Department of Statistics, University of Pennsylvania Google Scholar
  5. Cai T, Xu G, Zhang J (2009b) On recovery of sparse signals via 1 minimization. IEEE Trans Inf Theory 55:3388–3397 CrossRefMathSciNetGoogle Scholar
  6. Candès E, Plan Y (2009) Near-ideal model selection by 1 minimization. Ann Stat 37:2145–2177 zbMATHCrossRefGoogle Scholar
  7. Candès E, Tao T (2005) Decoding by linear programming. IEEE Trans Inf Theory 51:4203–4215 CrossRefGoogle Scholar
  8. Candès E, Tao T (2007) The Dantzig selector: statistical estimation when p is much larger than n (with discussion). Ann Stat 35:2313–2404 zbMATHCrossRefGoogle Scholar
  9. Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc, Ser B 39:1–38 zbMATHMathSciNetGoogle Scholar
  10. Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360 zbMATHCrossRefMathSciNetGoogle Scholar
  11. Friedman J, Hastie T, Hoefling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1:302–332 zbMATHCrossRefMathSciNetGoogle Scholar
  12. Friedman J, Hastie T, Tibshirani R (2008) Regularized paths for generalized linear models via coordinate descent. Tech rep, Department of Statistics, Stanford University Google Scholar
  13. Fu WJ (1998) Penalized regression: the Bridge versus the Lasso. J Comput Graph Stat 7:397–416 CrossRefGoogle Scholar
  14. Greenshtein E, Ritov Y (2004) Persistence in high-dimensional predictor selection and the virtue of over-parametrization. Bernoulli 10:971–988 zbMATHCrossRefMathSciNetGoogle Scholar
  15. Grün B, Leisch F (2007) Fitting finite mixtures of generalized linear regressions in R. Comput Stat Data Anal 51:5247–5252. doi: 10.1016/j.csda.2006.08.014 zbMATHCrossRefGoogle Scholar
  16. Grün B, Leisch F (2008) FlexMix version 2: finite mixtures with concomitant variables and varying and constant parameters. J Stat Softw 28:1–35. Google Scholar
  17. Huang J, Ma S, Zhang CH (2008) Adaptive Lasso for sparse high-dimensional regression models. Stat Sin 18:1603–1618 zbMATHMathSciNetGoogle Scholar
  18. Khalili A, Chen J (2007) Variable selection in finite mixture of regression models. J Am Stat Assoc 102:1025–1038 zbMATHCrossRefMathSciNetGoogle Scholar
  19. Koltchinskii V (2009) The Dantzig selector and sparsity oracle inequalities. Bernoulli 15:799–828 CrossRefMathSciNetGoogle Scholar
  20. Lehmann E (1983) Theory of point estimation. Wadsworth and Brooks/Cole, Pacific Grove zbMATHGoogle Scholar
  21. Leisch F (2004) FlexMix: a general framework for finite mixture models and latent class regression in R. J Stat Softw 11:1–18. Google Scholar
  22. McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York zbMATHCrossRefGoogle Scholar
  23. Meier L, van de Geer S, Bühlmann P (2008) The group Lasso for logistic regression. J R Stat Soc, Ser B 70:53–71 zbMATHGoogle Scholar
  24. Meinshausen N, Bühlmann P (2006) High dimensional graphs and variable selection with the Lasso. Ann Stat 34:1436–1462 zbMATHCrossRefGoogle Scholar
  25. Meinshausen N, Yu B (2009) Lasso-type recovery of sparse representations for high-dimensional data. Ann Stat 37:246–270 zbMATHCrossRefMathSciNetGoogle Scholar
  26. Pan W, Shen X (2007) Penalized model-based clustering with application to variable selection. J Mach Learn Res 8:1145–1164 Google Scholar
  27. Park T, Casella G (2008) The Bayesian Lasso. J Am Stat Assoc 103:681–686 zbMATHCrossRefMathSciNetGoogle Scholar
  28. Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc, Ser B 58:267–288 zbMATHMathSciNetGoogle Scholar
  29. Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109:475–494 zbMATHCrossRefMathSciNetGoogle Scholar
  30. Tseng P, Yun S (2008) A coordinate gradient descent method for nonsmooth separable minimization. Math Program, Ser B 117:387–423 CrossRefMathSciNetGoogle Scholar
  31. Tsybakov A (2004) Optimal aggregation of classifiers in statistical learning. Ann Stat 32:135–166 zbMATHCrossRefMathSciNetGoogle Scholar
  32. van de Geer S (2000) Empirical processes in M-estimation. University Press, Cambridge Google Scholar
  33. van de Geer S (2008) High-dimensional generalized linear models and the Lasso. Ann Stat 36:614–645 zbMATHCrossRefGoogle Scholar
  34. van de Geer S, Bühlmann P (2009) On the conditions used to prove oracle results for the Lasso. Electron J Stat 3:1360–1392 CrossRefMathSciNetGoogle Scholar
  35. van de Geer S, Zhou S, Bühlmann P (2010) Prediction and variable selection with the Adaptive Lasso. Arxiv preprint arXiv:1001.5176 [mathST]
  36. van der Vaart A (2007) Asymptotic statistics. University Press, Cambridge Google Scholar
  37. van der Vaart A, Wellner J (1996) Weak convergence and empirical processes. Springer, Berlin zbMATHGoogle Scholar
  38. Wainwright M (2009) Sharp thresholds for high-dimensional and noisy sparsity recovery using 1-constrained quadratic programming (Lasso). IEEE Trans Inf Theory 55:2183–2202 CrossRefGoogle Scholar
  39. Wu C (1983) On the convergence properties of the EM algorithm. Ann Stat 11:95–103 zbMATHCrossRefGoogle Scholar
  40. Zhang T (2009) Some sharp performance bounds for least squares regression with L1 regularization. Ann Stat 37:2109 –2144 zbMATHCrossRefGoogle Scholar
  41. Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942 zbMATHCrossRefGoogle Scholar
  42. Zhang CH, Huang J (2008) The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann Stat 36:1567–1594 zbMATHCrossRefGoogle Scholar
  43. Zhao P, Yu B (2006) On model selection consistency of Lasso. J Mach Learn Res 7:2541–2563 MathSciNetGoogle Scholar
  44. Zou H (2006) The adaptive Lasso and its oracle properties. J Am Stat Assoc 101:1418–1429 zbMATHCrossRefGoogle Scholar

Copyright information

© Sociedad de Estadística e Investigación Operativa 2010

Authors and Affiliations

  • Nicolas Städler
    • 1
  • Peter Bühlmann
    • 1
    Email author
  • Sara van de Geer
    • 1
  1. 1.Seminar für StatistikETH ZürichZürichSwitzerland

Personalised recommendations