Skip to main content

MM for penalized estimation


Penalized estimation can conduct variable selection and parameter estimation simultaneously. The general framework is to minimize a loss function subject to a penalty designed to generate sparse variable selection. The majorization–minimization (MM) algorithm is a computational scheme for stability and simplicity, and the MM algorithm has been widely applied in penalized estimation. Much of the previous work has focused on convex loss functions such as generalized linear models. When data are contaminated with outliers, robust loss functions can generate more reliable estimates. Recent literature has witnessed a growing impact of nonconvex loss-based methods, which can generate robust estimation for data contaminated with outliers. This article investigates MM algorithm for penalized estimation, provides innovative optimality conditions and establishes convergence theory with both convex and nonconvex loss functions. With respect to applications, we focus on several nonconvex loss functions, which were formerly studied in machine learning for regression and classification problems. Performance of the proposed algorithms is evaluated on simulated and real data including cancer clinical status. Efficient implementations of the algorithms are available in the R package mpath in CRAN.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3


  • Alfons A, Croux C, Gelper S et al (2013) Sparse least trimmed squares regression for analyzing high-dimensional large data sets. Ann Appl Stat 7(1):226–248

    MathSciNet  Article  Google Scholar 

  • Breheny P, Huang J (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann Appl Stat 5(1):232–253

    MathSciNet  Article  Google Scholar 

  • Byrne CL (2018) Auxiliary-function minimization algorithms. Appl Anal Optim 2(2):171–198

    MathSciNet  Google Scholar 

  • Chi EC, Scott DW (2014) Robust parametric classification and variable selection by a minimum distance criterion. J Comput Gr Stat 23(1):111–128

    MathSciNet  Article  Google Scholar 

  • Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499

    MathSciNet  Article  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Associ 96(456):1348–1360

    MathSciNet  Article  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1–22

    Article  Google Scholar 

  • Hunter DR, Li R (2005) Variable selection using MM algorithms. Ann Stat 33(4):1617–1642

    MathSciNet  Article  Google Scholar 

  • Lange K (2016) MM optimization algorithms. SIAM, Philadelphia

    Book  Google Scholar 

  • Li AH, Bradic J (2018) Boosting in the presence of outliers: adaptive classification with nonconvex loss functions. J Am Stat Assoc 113(522):660–674

  • Razaviyayn M, Hong M, Luo Z-Q (2013) A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J Optim 23(2):1126–1153

    MathSciNet  Article  Google Scholar 

  • Rosset S, Zhu J (2007) Piecewise linear regularized solution paths. Ann Stat 35(3):1012–1030

    MathSciNet  Article  Google Scholar 

  • Schifano ED, Strawderman RL, Wells MT et al (2010) Majorization-minimization algorithms for nonsmoothly penalized objective functions. Electron J Stat 4:1258–1299

    MathSciNet  Article  Google Scholar 

  • Shi L, Campbell G, Jones W, Campagne F et al (2010) The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol 28(8):827–838

    Article  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288

    MathSciNet  MATH  Google Scholar 

  • Wang H, Li G, Jiang G (2007) Robust regression shrinkage and consistent variable selection through the LAD-Lasso. J Bus Econ Stat 25(3):347–355

    MathSciNet  Article  Google Scholar 

  • Wang Z (2018a) Quadratic majorization for nonconvex loss with applications to the boosting algorithm. J Comput Gr Stat 27(3):491–502

    MathSciNet  Article  Google Scholar 

  • Wang Z (2018b) Robust boosting with truncated loss functions. Electron J Stat 12(1):599–650

    MathSciNet  Article  Google Scholar 

  • Wu Y, Liu Y (2007) Robust truncated hinge loss support vector machines. J Am Stat Assoc 102(479):974–983

    MathSciNet  Article  Google Scholar 

  • Yi C, Huang J (2017) Semismooth Newton coordinate descent algorithm for elastic-net penalized Huber loss regression and quantile regression. J Comput Gr Stat 26(3):547–557

    MathSciNet  Article  Google Scholar 

  • Zhang C-H (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942

    MathSciNet  Article  Google Scholar 

  • Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509

    MathSciNet  MATH  Google Scholar 

Download references


The author would like to thank the Associate Editor and two anonymous referees for their constructive comments which have led to a much improved paper.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Zhu Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 304 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, Z. MM for penalized estimation. TEST 31, 54–75 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Classification
  • MM algorithm
  • Nonconvex
  • Quadratic majorization
  • Regression
  • Robust estimation
  • Variable selection

Mathematics Subject Classification

  • 62F35
  • 62H30
  • 62J07
  • 62-08
  • 68U01