Skip to main content
Log in

Space alternating penalized Kullback proximal point algorithms for maximizing likelihood with nondifferentiable penalty

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

The EM algorithm is a widely used methodology for penalized likelihood estimation. Provable monotonicity and convergence are the hallmarks of the EM algorithm and these properties are well established for smooth likelihood and smooth penalty functions. However, many relaxed versions of variable selection penalties are not smooth. In this paper, we introduce a new class of space alternating penalized Kullback proximal extensions of the EM algorithm for nonsmooth likelihood inference. We show that the cluster points of the new method are stationary points even when they lie on the boundary of the parameter set. We illustrate the new class of algorithms for the problems of model selection for finite mixtures of regression and of sparse image reconstruction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. Second International Symposium on Information Theory (Tsahkadsor, 1971) (pp. 267–281). Budapest: Akadmiai Kiad.

  • Barron, A. R. (1999). Information-theoretic characterization of Bayes performance and the choice of priors in parametric and nonparametric problems. In Bayesian statistics, 6 (Alcoceber 1998) (Vol. 6, 2752). New York: Oxford University Press.

  • Candès E., Plan Y. (2009) Near-ideal model selection by L1 minimization. The Annals of Statistics 37(5): 2145–2177

    Article  MathSciNet  MATH  Google Scholar 

  • Candès E., Tao T. (2007) The Dantzig selector: statistical estimation when p is much larger than n. The Annals of Statistics 35(6): 2313–2351

    Article  MathSciNet  MATH  Google Scholar 

  • Celeux G., Chrétien S., Forbes F., Mkhadri A. (2001) A component-wise EM algorithm for mixtures. Journal of Computational and Graphical Statistics 10(4): 697–712

    Article  MathSciNet  Google Scholar 

  • Chrétien S., Hero A. O. (2000) Kullback proximal algorithms for maximum-likelihood estimation. Information-theoretic imaging. IEEE Transactions on Information Theory 46(5): 1800–1810

    Article  MATH  Google Scholar 

  • Chrétien S., Hero A. O. (2008) On EM algorithms and their proximal generalizations. European Society for Applied and Industrial Mathematics Probability and Statistics 12: 308–326

    MATH  Google Scholar 

  • Clarke, F. (1990). Optimization and nonsmooth analysis (Vol. 5: Classics in Applied Mathematics). Philadelphia: Society for Industrial and Applied Mathematics.

  • Csiszár, I. (1967). Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica, 2, 299–318.

    Google Scholar 

  • Delyon B., Lavielle M., Moulines E. (1999) Convergence of a stochastic approximation version of the EM algorithm. The Annals of Statistics 27(1): 94–128

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster A. P., Laird N. M., Rubin D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39(1): 1–38

    MathSciNet  MATH  Google Scholar 

  • Fan J., Li R. (2001) Variable selection via non-concave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96: 1348–1360

    Article  MathSciNet  MATH  Google Scholar 

  • Fan J., Li R. (2002) Variable selection for Cox’s proportional hazards model and frailty model. The Annals of Statistics 30: 74–99

    Article  MathSciNet  MATH  Google Scholar 

  • Fessler J. A., Hero A. O. (1994) Space-alternating generalized expectation–maximization algorithm. IEEE Transactions on Signal Processing 42(10): 2664–2677

    Article  Google Scholar 

  • Figueiredo M. A. T., Nowak R. D. (2003) An EM algorithm for wavelet-based image restoration. IEEE Transactions on Image Processing 12(8): 906–916

    Article  MathSciNet  Google Scholar 

  • Friedmand, J., & Popescu, B. E. (2003). Importance sampled learning ensembles. Journal of Machine Learning Research (submitted).

  • Hiriart-Urruty J. B., Lemaréchal C. (1993) Convex analysis and minimization algorithms (Vol. 306: Grundlehren der mathematischen Wissenschaften). Springer, Berlin

    Google Scholar 

  • Hunter D. R., Lange K. (2004) A tutorial on MM algorithms. The American Statistician 58(1): 30–37

    Article  MathSciNet  Google Scholar 

  • Ibragimov, I.A., & Hasminski, R.Z. (1981). Statistical estimation. asymptotic theory Translated from the Russian by Samuel Kotz. Applications of Mathematics (Vol. 16.). New York: Springer.

  • Johnstone I. M., Silverman B. W. (2004) Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences. The Annals of Statistics 32(4): 1594–1649

    Article  MathSciNet  MATH  Google Scholar 

  • Khalili A., Chen J. (2007) Variable selection in finite mixture of regression models. Journal of the American Statistical Association 102(479): 1025–1038

    Article  MathSciNet  MATH  Google Scholar 

  • Koh K., Kim S.-J., Boyd S. (2007) An interior-point method for large-scale l1-regularized logistic regression. Journal of Machine Learning Research 8: 1519–1555

    MathSciNet  MATH  Google Scholar 

  • Kuhn E., Lavielle M. (2004) Coupling a stochastic approximation version of EM with an MCMC procedure. European Society for Applied and Industrial Mathematics Probability and Statistics 8: 115–131

    MathSciNet  MATH  Google Scholar 

  • Liu C., Rubin D. B., Wu Y. N. (1998) Parameter expansion to accelerate EM: The PX–EM algorithm. Biometrika 85(4): 755–770

    Article  MathSciNet  MATH  Google Scholar 

  • Martinet B. (1970) Régularisation d’inéquation variationnelles par approximations successives. Revue Francaise d’Informatique et de Recherche Operationnelle 3: 154–179

    MathSciNet  Google Scholar 

  • Rockafellar, R. T. (1970). Convex analysis. Convex analysis. Princeton Mathematical Series (No. 28) Princeton: Princeton University Press.

  • Rockafellar R. T. (1976) Monotone operators and the proximal point algorithm. Society for Industrial and Applied Mathematics Journal on Control and Optimization 14: 877–898

    MathSciNet  MATH  Google Scholar 

  • Rockafellar, R. T., & Wets, R. J. B. (2004). Variational analysis. Variational analysis, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] (Vol. 317). Berlin: Springer.

  • Schwarz G. (1978) Estimating the dimension of a model. The Annals of Statistics 6: 461–464

    Article  MathSciNet  MATH  Google Scholar 

  • Tibshirani R. (1996) Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, Series B 58(1): 267–288

    MathSciNet  MATH  Google Scholar 

  • Tipping M. E. (2001) Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1(3): 211–244

    MathSciNet  MATH  Google Scholar 

  • Varadhan R., Roland Ch. (2007) Simple and globally-convergent numerical methods for accelerating any EM algorithm. Scandinavian Journal of Statistics 35(2): 335–353

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stéphane Chrétien.

About this article

Cite this article

Chrétien, S., Hero, A. & Perdry, H. Space alternating penalized Kullback proximal point algorithms for maximizing likelihood with nondifferentiable penalty. Ann Inst Stat Math 64, 791–809 (2012). https://doi.org/10.1007/s10463-011-0333-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-011-0333-x

Keywords

Navigation