Stochastic proximal-gradient algorithms for penalized mixed models

  • Gersende Fort
  • Edouard Ollier
  • Adeline Samson


Motivated by penalized likelihood maximization in complex models, we study optimization problems where neither the function to optimize nor its gradient has an explicit expression, but its gradient can be approximated by a Monte Carlo technique. We propose a new algorithm based on a stochastic approximation of the proximal-gradient (PG) algorithm. This new algorithm, named stochastic approximation PG (SAPG) is the combination of a stochastic gradient descent step which—roughly speaking—computes a smoothed approximation of the gradient along the iterations, and a proximal step. The choice of the step size and of the Monte Carlo batch size for the stochastic gradient descent step in SAPG is discussed. Our convergence results cover the cases of biased and unbiased Monte Carlo approximations. While the convergence analysis of some classical Monte Carlo approximation of the gradient is already addressed in the literature (see Atchadé et al. in J Mach Learn Res 18(10):1–33, 2017), the convergence analysis of SAPG is new. Practical implementation is discussed, and guidelines to tune the algorithm are given. The two algorithms are compared on a linear mixed effect model as a toy example. A more challenging application is proposed on nonlinear mixed effect models in high dimension with a pharmacokinetic data set including genomic covariates. To our best knowledge, our work provides the first convergence result of a numerical method designed to solve penalized maximum likelihood in a nonlinear mixed effect model.


Proximal-gradient algorithm Stochastic gradient Stochastic EM algorithm Stochastic approximation Nonlinear mixed effect models 


  1. Andrieu, C., Moulines, E.: On the ergodicity properties of some adaptive MCMC algorithms. Ann. Appl. Prob. 16(3), 1462–1505 (2006)MathSciNetCrossRefMATHGoogle Scholar
  2. Andrieu, C., Thoms, J.: A tutorial on adaptive MCMC. Stat. Comput. 18(4), 343–373 (2008)MathSciNetCrossRefGoogle Scholar
  3. Atchadé, Y., Fort, G., Moulines, E.: On perturbed proximal gradient algorithms. J. Mach. Learn. Res. 18(10), 1–33 (2017)MathSciNetMATHGoogle Scholar
  4. Bartholomew, D., Knott, M., Moustaki, I.: Latent Variable Models and Factor Analysis. Wiley Series in Probability and Statistics, 3rd edn. Wiley, Chichester (2011)MATHGoogle Scholar
  5. Bauschke, H., Combettes, P.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC. Springer, New York (2011)Google Scholar
  6. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)MathSciNetCrossRefMATHGoogle Scholar
  7. Benveniste, A., Métivier, M., Priouret, P.: Adaptive Algorithms and Stochastic Approximations, Applications of Mathematics, vol. 22. Springer, Berlin (1990)CrossRefMATHGoogle Scholar
  8. Bertrand, J., Balding, D.: Multiple single nucleotide polymorphism analysis using penalized regression in nonlinear mixed-effect pharmacokinetic models. Pharmacogenet. Genomics 23(3), 167–174 (2013)CrossRefGoogle Scholar
  9. Bickel, P.J., Doksum, K.A.: Mathematical Statistics—Basic Ideas and Selected Topics, vol. 1, 2nd edn. Texts in Statistical Science Series. CRC Press, Boca Raton (2015)Google Scholar
  10. Chen, J., Chen, Z.: Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95(3), 759–771 (2008)MathSciNetCrossRefMATHGoogle Scholar
  11. Chen, H., Zeng, D., Wang, Y.: Penalized nonlinear mixed effects model to identify biomarkers that predict disease progression. Biometrics 73, 1343–1354 (2017)CrossRefGoogle Scholar
  12. Combettes, P., Pesquet, J.: Proximal splitting methods in signal processing. In: Bauschke, H., Burachik, R., Combettes, P., Elser, V., Luke, D., Wolkowicz, H. (eds.) Fixed-Point Algorithms for Inverse Problems in Science and Engineering. Springer Optimization and Its Applications, vol. 49. Springer, New York (2011)Google Scholar
  13. Combettes, P.L., Pesquet, J.C.: Stochastic quasi-fejér block-coordinate fixed point iterations with random sweeping. SIAM J. Optim. 25(2), 1221–1248 (2015)MathSciNetCrossRefMATHGoogle Scholar
  14. Combettes, P., Pesquet, J.: Stochastic approximations and perturbations in forward–backward splitting for monotone operators. Online J. Pure Appl. Funct. Anal. 1(1), 1–37 (2016)MathSciNetMATHGoogle Scholar
  15. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)MathSciNetCrossRefMATHGoogle Scholar
  16. Delavenne, X., Ollier, E., Basset, T., Bertoletti, L., Accassat, S., Garcin, A., Laporte, S., Zufferey, P., Mismetti, P.: A semi-mechanistic absorption model to evaluate drug–drug interaction with dabigatran: application with clarithromycin. Br. J. Clin. Pharmacol. 76(1), 107–113 (2013)CrossRefGoogle Scholar
  17. Delyon, B., Lavielle, M., Moulines, E.: Convergence of a stochastic approximation version of the EM algorithm. Ann. Stat. 27(1), 94–128 (1999)MathSciNetCrossRefMATHGoogle Scholar
  18. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39, 1–38 (1977)MathSciNetMATHGoogle Scholar
  19. Fort, G., Moulines, E., Priouret, P.: Convergence of adaptive and interacting Markov chain Monte Carlo algorithms. Ann. Stat. 39(6), 3262–3289 (2011a)MathSciNetCrossRefMATHGoogle Scholar
  20. Fort, G., Moulines, E., Priouret, P.: Convergence of adaptive and interacting Markov chain Monte Carlo algorithms. Ann. Stat. 39(6), 3262–3289 (2011b)MathSciNetCrossRefMATHGoogle Scholar
  21. Fort, G., Jourdain, B., Kuhn, E., Lelièvre, T., Stoltz, G.: Convergence of the Wang–Landau algorithm. Math. Comput. 84(295), 2297–2327 (2015)MathSciNetCrossRefMATHGoogle Scholar
  22. Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)CrossRefGoogle Scholar
  23. Gouin-Thibault, I., Delavenne, X., Blanchard, A., Siguret, V., Salem, J., Narjoz, C., Gaussem, P., Beaune, P., Funck-Brentano, C., Azizi, M., et al.: Interindividual variability in dabigatran and rivaroxaban exposure: contribution of abcb1 genetic polymorphisms and interaction with clarithromycin. J. Thromb. Haemost. 15(2), 273–283 (2017)CrossRefGoogle Scholar
  24. Hall, P., Heyde, C.C.: Probability and mathematical statistics. In: Hall, P., Heyde, C.C. (eds.) Martingale Limit Theory and Its Application. Academic Press, New York (1980)Google Scholar
  25. Kushner, H., Yin, G.: Stochastic Approximation and Recursive Algorithms and Applications: Applications of Mathematics, vol. 35, 2nd edn. Springer, New York (2003)MATHGoogle Scholar
  26. Lehmann, E., Casella, G.: Theory of Point Estimation. Springer, New York (2006)MATHGoogle Scholar
  27. Levine, R.A., Fan, J.: An automated (Markov chain) Monte Carlo EM algorithm. J. Stat. Comput. Simul. 74(5), 349–359 (2004)MathSciNetCrossRefMATHGoogle Scholar
  28. Lin, J., Rosasco, L., Villa, S., Zhou, D.: Modified Fejer sequences and applications. Technical report. arXiv:1510.04641v1 [math.OC] (2015)
  29. McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions. Wiley Series in Probability and Statistics, 2nd edn. Wiley-Interscience, Hoboken (2008)MATHGoogle Scholar
  30. Meyer, R.R.: Sufficient conditions for the convergence of monotonic mathematical programming algorithms. J. Comput. Syst. Sci. 12(1), 108–121 (1976)MathSciNetCrossRefMATHGoogle Scholar
  31. Meyn, S., Tweedie, R.L.: Markov Chains and Stochastic Stability, 2nd edn. Cambridge University Press, Cambridge (2009)CrossRefMATHGoogle Scholar
  32. Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Compt. Rendus Math. l’Acad. Sci. 255, 2897–2899 (1962)MathSciNetMATHGoogle Scholar
  33. Ng, S., Krishnan, T., McLachlan, G.: The EM algorithm. In: Gentle, J., Härdle, W., Mori, Y. (eds.) Handbook of Computational Statistics—Concepts and Methods, vol. 1, 2nd edn, pp. 139–172. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  34. Ollier, E., Hodin, S., Basset, T., Accassat, S., Bertoletti, L., Mismetti, P., Delavenne, X.: In vitro and in vivo evaluation of drug–drug interaction between dabigatran and proton pump inhibitors. Fundam. Clin. Pharmacol. 29(6), 604–614 (2015)CrossRefGoogle Scholar
  35. Ollier, E., Samson, A., Delavenne, X., Viallon, V.: A saem algorithm for fused lasso penalized nonlinear mixed effect models: application to group comparison in pharmacokinetics. Comput. Stat. Data Anal. 95, 207–221 (2016)MathSciNetCrossRefGoogle Scholar
  36. Parikh, N., Boyd, S.: Proximal Algorithms. Found. Trends Optim. 1(3), 123–231 (2013)Google Scholar
  37. Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer Texts in Statistics, 2nd edn. Springer, New York (2004)CrossRefGoogle Scholar
  38. Roberts, G., Rosenthal, J.: Coupling and ergodicity of adaptive MCMC. J. Appl. Prob. 44, 458–475 (2007)CrossRefMATHGoogle Scholar
  39. Rosasco, L., Villa, S., Vu, B.: Convergence of a stochastic proximal gradient algorithm. Technical report. arXiv:1403.5075v3 (2014)
  40. Rosasco, L., Villa, S., Vu, B.: A stochastic inertial forward–backward splitting algorithm for multivariate monotone inclusions. Optimization 65(6), 1293–1314 (2016)MathSciNetCrossRefMATHGoogle Scholar
  41. Saksman, E., Vihola, M.: On the ergodicity of the adaptive Metropolis algorithm on unbounded domains. Ann. Appl. Prob. 20(6), 2178–2203 (2010)MathSciNetCrossRefMATHGoogle Scholar
  42. Samson, A., Lavielle, M., Mentré, F.: The SAEM algorithm for group comparison tests in longitudinal data analysis based on non-linear mixed-effects model. Stat. Med. 26(27), 4860–4875 (2007)MathSciNetCrossRefGoogle Scholar
  43. Schreck, A., Fort, G., Moulines, E.: Adaptive equi-energy sampler: convergence and illustration. ACM Trans. Model. Comput. Simul. 23(1), 5 (2013)MathSciNetCrossRefGoogle Scholar
  44. Städler, N., Bühlmann, P., van de Geer, S.: \(\ell \)1-penalization for mixture regression models. Test 19(2), 209–256 (2010)MathSciNetCrossRefMATHGoogle Scholar
  45. Wei, G., Tanner, M.: A Monte-Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J. Am. Stat. Assoc. 85, 699–704 (1990)CrossRefGoogle Scholar
  46. Wu, C.F.: On the convergence properties of the EM algorithm. Ann. Stat. 11(1), 95–103 (1983)MathSciNetCrossRefMATHGoogle Scholar
  47. Zangwill, W.: Nonlinear Programming: A Unified Approach. Prentice-Hall International Series in Management. Prentice-Hall, Englewood Cliffs (1969)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Gersende Fort
    • 1
  • Edouard Ollier
    • 2
    • 3
  • Adeline Samson
    • 4
  1. 1.IMT UMR5219, CNRSUniversité de ToulouseToulouse Cedex 9France
  2. 2.INSERM, U1059Dysfonction Vasculaire et HémostaseSaint EtienneFrance
  3. 3.U.M.P.A., Ecole Normale Supérieure de Lyon, CNRS UMR 5669INRIA, Project-Team NUMEDLyon Cedex 07France
  4. 4.Laboratoire Jean Kuntzmann, UMR CNRS 5224Université Grenoble-AlpesGrenobleFrance

Personalised recommendations