Abstract
The EM algorithm is a widely used methodology for penalized likelihood estimation. Provable monotonicity and convergence are the hallmarks of the EM algorithm and these properties are well established for smooth likelihood and smooth penalty functions. However, many relaxed versions of variable selection penalties are not smooth. In this paper, we introduce a new class of space alternating penalized Kullback proximal extensions of the EM algorithm for nonsmooth likelihood inference. We show that the cluster points of the new method are stationary points even when they lie on the boundary of the parameter set. We illustrate the new class of algorithms for the problems of model selection for finite mixtures of regression and of sparse image reconstruction.
Similar content being viewed by others
References
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. Second International Symposium on Information Theory (Tsahkadsor, 1971) (pp. 267–281). Budapest: Akadmiai Kiad.
Barron, A. R. (1999). Information-theoretic characterization of Bayes performance and the choice of priors in parametric and nonparametric problems. In Bayesian statistics, 6 (Alcoceber 1998) (Vol. 6, 2752). New York: Oxford University Press.
Candès E., Plan Y. (2009) Near-ideal model selection by L1 minimization. The Annals of Statistics 37(5): 2145–2177
Candès E., Tao T. (2007) The Dantzig selector: statistical estimation when p is much larger than n. The Annals of Statistics 35(6): 2313–2351
Celeux G., Chrétien S., Forbes F., Mkhadri A. (2001) A component-wise EM algorithm for mixtures. Journal of Computational and Graphical Statistics 10(4): 697–712
Chrétien S., Hero A. O. (2000) Kullback proximal algorithms for maximum-likelihood estimation. Information-theoretic imaging. IEEE Transactions on Information Theory 46(5): 1800–1810
Chrétien S., Hero A. O. (2008) On EM algorithms and their proximal generalizations. European Society for Applied and Industrial Mathematics Probability and Statistics 12: 308–326
Clarke, F. (1990). Optimization and nonsmooth analysis (Vol. 5: Classics in Applied Mathematics). Philadelphia: Society for Industrial and Applied Mathematics.
Csiszár, I. (1967). Information-type measures of difference of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica, 2, 299–318.
Delyon B., Lavielle M., Moulines E. (1999) Convergence of a stochastic approximation version of the EM algorithm. The Annals of Statistics 27(1): 94–128
Dempster A. P., Laird N. M., Rubin D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39(1): 1–38
Fan J., Li R. (2001) Variable selection via non-concave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96: 1348–1360
Fan J., Li R. (2002) Variable selection for Cox’s proportional hazards model and frailty model. The Annals of Statistics 30: 74–99
Fessler J. A., Hero A. O. (1994) Space-alternating generalized expectation–maximization algorithm. IEEE Transactions on Signal Processing 42(10): 2664–2677
Figueiredo M. A. T., Nowak R. D. (2003) An EM algorithm for wavelet-based image restoration. IEEE Transactions on Image Processing 12(8): 906–916
Friedmand, J., & Popescu, B. E. (2003). Importance sampled learning ensembles. Journal of Machine Learning Research (submitted).
Hiriart-Urruty J. B., Lemaréchal C. (1993) Convex analysis and minimization algorithms (Vol. 306: Grundlehren der mathematischen Wissenschaften). Springer, Berlin
Hunter D. R., Lange K. (2004) A tutorial on MM algorithms. The American Statistician 58(1): 30–37
Ibragimov, I.A., & Hasminski, R.Z. (1981). Statistical estimation. asymptotic theory Translated from the Russian by Samuel Kotz. Applications of Mathematics (Vol. 16.). New York: Springer.
Johnstone I. M., Silverman B. W. (2004) Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences. The Annals of Statistics 32(4): 1594–1649
Khalili A., Chen J. (2007) Variable selection in finite mixture of regression models. Journal of the American Statistical Association 102(479): 1025–1038
Koh K., Kim S.-J., Boyd S. (2007) An interior-point method for large-scale l1-regularized logistic regression. Journal of Machine Learning Research 8: 1519–1555
Kuhn E., Lavielle M. (2004) Coupling a stochastic approximation version of EM with an MCMC procedure. European Society for Applied and Industrial Mathematics Probability and Statistics 8: 115–131
Liu C., Rubin D. B., Wu Y. N. (1998) Parameter expansion to accelerate EM: The PX–EM algorithm. Biometrika 85(4): 755–770
Martinet B. (1970) Régularisation d’inéquation variationnelles par approximations successives. Revue Francaise d’Informatique et de Recherche Operationnelle 3: 154–179
Rockafellar, R. T. (1970). Convex analysis. Convex analysis. Princeton Mathematical Series (No. 28) Princeton: Princeton University Press.
Rockafellar R. T. (1976) Monotone operators and the proximal point algorithm. Society for Industrial and Applied Mathematics Journal on Control and Optimization 14: 877–898
Rockafellar, R. T., & Wets, R. J. B. (2004). Variational analysis. Variational analysis, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] (Vol. 317). Berlin: Springer.
Schwarz G. (1978) Estimating the dimension of a model. The Annals of Statistics 6: 461–464
Tibshirani R. (1996) Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, Series B 58(1): 267–288
Tipping M. E. (2001) Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1(3): 211–244
Varadhan R., Roland Ch. (2007) Simple and globally-convergent numerical methods for accelerating any EM algorithm. Scandinavian Journal of Statistics 35(2): 335–353
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Chrétien, S., Hero, A. & Perdry, H. Space alternating penalized Kullback proximal point algorithms for maximizing likelihood with nondifferentiable penalty. Ann Inst Stat Math 64, 791–809 (2012). https://doi.org/10.1007/s10463-011-0333-x
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-011-0333-x