Statistics and Computing

, Volume 21, Issue 2, pp 261–273 | Cite as

A quasi-Newton acceleration for high-dimensional optimization algorithms

Open Access


In many statistical problems, maximum likelihood estimation by an EM or MM algorithm suffers from excruciatingly slow convergence. This tendency limits the application of these algorithms to modern high-dimensional problems in data mining, genomics, and imaging. Unfortunately, most existing acceleration techniques are ill-suited to complicated models involving large numbers of parameters. The squared iterative methods (SQUAREM) recently proposed by Varadhan and Roland constitute one notable exception. This paper presents a new quasi-Newton acceleration scheme that requires only modest increments in computation per iteration and overall storage and rivals or surpasses the performance of SQUAREM on several representative test problems.

Maximum likelihood Multivariate t Admixture models Imaging Generalized eigenvalues 


  1. Alexander, D.H., Novembre, J., Lange, K.L.: Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009) CrossRefGoogle Scholar
  2. Becker, M.P., Young, I., Lange, K.L.: EM algorithms without missing data. Stat. Methods Med. Res. 6, 37–53 (1997) Google Scholar
  3. de Leeuw, J.: Block relaxation algorithms in statistics. In: Bock, H.H., Lenski, W., Richter, M.M. (eds.) Information Systems and Data Analysis, pp. 308–325. Springer, New York (1994) Google Scholar
  4. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. Ser. B 39(1), 1–38 (1977) MATHMathSciNetGoogle Scholar
  5. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996) MATHGoogle Scholar
  6. Griffiths, D.A.: Maximum likelihood estimation for the beta-binomial distribution and an application to the household distribution of the total number of cases of a disease. Biometrics 29(4), 637–648 (1973) CrossRefGoogle Scholar
  7. Heiser, W.J.: Convergent computing by iterative majorization: theory and applications in multidimensional data analysis. In: Krzanowski, W.J. (ed.) Recent Advances in Descriptive Multivariate Analysis, pp. 157–189. Clarendon Press, Oxford (1995) Google Scholar
  8. Hestenes, M.R., Karush, W.: A method of gradients for the calculation of the characteristic roots and vectors of a real symmetric matrix. J. Res. Natl. Bur. Stand. 47, 45–61 (1951a) MathSciNetGoogle Scholar
  9. Hestenes, M.R., Karush, W.: Solutions of Ax=λ Bx. J. Res. Natl. Bur. Stand. 47, 471–478 (1951b) MathSciNetGoogle Scholar
  10. Hotelling, H.: Analysis of a complex of statistical variables onto principal components. J. Educ. Psychol. 24, 417–441 (1933) CrossRefGoogle Scholar
  11. Hotelling, H.: Relations between two sets of variables. Biometrika 28, 321–377 (1936) MATHGoogle Scholar
  12. Hunter, D.R., Lange, K.L.: A tutorial on MM algorithms. Am. Stat. 58, 30–37 (2004) CrossRefMathSciNetGoogle Scholar
  13. Jamshidian, M., Jennrich, R.I.: Conjugate gradient acceleration of the EM algorithm. J. Am. Stat. Assoc. 88(421), 221–228 (1993) MATHCrossRefMathSciNetGoogle Scholar
  14. Jamshidian, M., Jennrich, R.I.: Acceleration of the EM algorithm by using quasi-Newton methods. J. R. Stat. Soc. Ser. B 59(3), 569–587 (1997) MATHCrossRefMathSciNetGoogle Scholar
  15. Jolliffe, I.: Principal Component Analysis. Springer, New York (1986) Google Scholar
  16. Kent, J.T., Tyler, D.E., Vardi, Y.: A curious likelihood identity for the multivariate t-distribution. Commun. Stat. Simul. Comput. 23(2), 441–453 (1994) MATHCrossRefMathSciNetGoogle Scholar
  17. Lange, K.L., Carson, R.: EM reconstruction algorithms for emission and transmission tomography. J. Comput. Assist. Tomogr. 8(2), 306–316 (1984) Google Scholar
  18. Lange, K.L.: A quasi-Newton acceleration of the EM algorithm. Stat. Sin. 5(1), 1–18 (1995) MATHGoogle Scholar
  19. Lange, K.L.: Numerical Analysis for Statisticians. Springer, New York (1999) MATHGoogle Scholar
  20. Lange, K.L.: Optimization transfer using surrogate objective functions. J. Comput. Statist. 9, 1–59 (2000) CrossRefGoogle Scholar
  21. Lange, K.L.: Optimization. Springer, New York (2004) MATHGoogle Scholar
  22. Lange, K.L., Little, R.J.A., Taylor, J.M.G.: Robust statistical modeling using the t distribution. J. Am. Stat. Assoc. 84(408), 881–896 (1989) CrossRefMathSciNetGoogle Scholar
  23. Lidwell, O.M., Somerville, T.: Observations on the incidence and distribution of the common cold in a rural community during 1948 and 1949. J. Hyg. Camb. 49, 365–381 (1951) CrossRefGoogle Scholar
  24. Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley-Interscience, New York (2002) MATHGoogle Scholar
  25. Liu, C., Rubin, D.B.: The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81(4), 633–648 (1994) MATHCrossRefMathSciNetGoogle Scholar
  26. Liu, C., Rubin, D.B., Wu, Y.N.: Parameter expansion to accelerate EM: the PX-EM algorithm. Biometrika 85(4), 755–770 (1998) MATHCrossRefMathSciNetGoogle Scholar
  27. Louis, T.A.: Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc. Ser. B 44(2), 226–233 (1982) MATHMathSciNetGoogle Scholar
  28. McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley-Interscience, New York (2008) MATHCrossRefGoogle Scholar
  29. Meng, X.L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2), 267–278 (1993) MATHCrossRefMathSciNetGoogle Scholar
  30. Meng, X.L., van Dyk, D.: The EM algorithm—an old folk-song sung to a fast new tune (with discussion). J. R. Stat. Soc. Ser. B 59(3), 511–567 (1997) MATHCrossRefGoogle Scholar
  31. Mitchell, M., Gregersen, P., Johnson, S., Parsons, R., Vlahov, D.: The New York Cancer Project: rationale, organization, design, and baseline characteristics. J. Urban Health 61, 301–310 (2004) Google Scholar
  32. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York (2006) MATHGoogle Scholar
  33. Saad, Y.: Numerical Methods for Large Eigenvalue Problems. Halstead [Wiley], New York (1992) MATHGoogle Scholar
  34. Titterington, D.M., Smith, A.F.M., Makov, U.E.: Statistical Analysis of Finite Mixture Distributions. Wiley, New York (1985) MATHGoogle Scholar
  35. Ueda, N., Nakano, R.: Deterministic annealing EM algorithm. Neural Netw. 11, 271–282 (1998) CrossRefGoogle Scholar
  36. Varadhan, R., Roland, C.: Squared extrapolation methods (squarem): a new class of simple and efficient numerical schemes for accelerating the convergence of the EM algorithm. Johns Hopkins University, Department of Biostatistics Working Papers (Paper 63) (2004) Google Scholar
  37. Varadhan, R., Roland, C.: Simple and globally convergent methods for accelerating the convergence of any EM algorithm. Scand. J. Statist. 35(2), 335–353 (2008) MATHCrossRefMathSciNetGoogle Scholar
  38. Vardi, Y., Shepp, L.A., Kaufman, L.: A statistical model for positron emission tomography (with discussion). J. Am. Stat. Assoc. 80(389), 8–37 (1985) MATHCrossRefMathSciNetGoogle Scholar
  39. Wu, T.T., Lange, K.L.: The MM alternatives to EM. Stat. Sci. (2009, in press) Google Scholar
  40. Zhou, H., Lange, K.L.: Rating movies and rating the raters who rate them. Am. Stat. 63(4), 297–307 (2009) CrossRefGoogle Scholar
  41. Zhou, H., Lange, K.L.: MM algorithms for some discrete multivariate distributions. J. Comput. Graph. Stat. (2009b, to appear) Google Scholar
  42. Zhou, H., Lange, K.L.: On the bumpy road to the dominant mode. Scand. J. Stat. (2009c, to appear) Google Scholar

Copyright information

© The Author(s) 2009

Authors and Affiliations

  1. 1.Department of Human GeneticsUniversity of CaliforniaLos AngelesUSA
  2. 2.Department of BiomathematicsUniversity of CaliforniaLos AngelesUSA
  3. 3.Departments of Biomathematics, Human Genetics, and StatisticsUniversity of CaliforniaLos AngelesUSA

Personalised recommendations