Newton’s Method and Scoring

  • Kenneth Lange
Part of the Statistics and Computing book series (SCO)


The MM and EM algorithms are hardly the only methods of optimization. Newton’s method is better known and more widely applied. We encountered Newton’s method in Section 5.4 of Chapter 5. Here we focus on the multidimensional version. Despite its defects, Newton’s method is the gold standard for speed of convergence and forms the basis of many modern optimization algorithms. Its variants seek to retain its fast convergence while taming its defects. The variants all revolve around the core idea of locally approximating the objective function by a strictly convex quadratic function. At each iteration the quadratic approximation is optimized. Safeguards are introduced to keep the iterates from veering toward irrelevant stationary points.


Exponential Family Secant Condition Dirichlet Distribution Multinomial Family Ascent Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bradley EL (1973) The equivalence of maximum likelihood and weighted least squares estimates in the exponential family. J Amer Stat Assoc 68:199-200MATHCrossRefGoogle Scholar
  2. 2.
    Byrd RH, Nocedal J (1989) A tool for the analysis of quasi-Newton methods with application to unconstrained minimization. SIAM J Numer Anal 26:727-739MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Charnes A, Frome EL, Yu PL (1976) The equivalence of generalized least squares and maximum likelihood in the exponential family. J Amer Stat Assoc 71:169-171MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Choi SC, Wette R (1969) Maximum likelihood estimation of the parameters of the gamma distribution and their bias. Technometrics 11:683-690MATHCrossRefGoogle Scholar
  5. 5.
    Conn AR, Gould NIM, Toint PL (1991) Convergence of quasi-Newton matrices generated by the symmetric rank-one update. Math Prog 50:177-195MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Cox DR (1970) Analysis of Binary Data. Methuen, LondonMATHGoogle Scholar
  7. 7.
    Davidon WC (1959) Variable metric methods for minimization. AEC Research and Development Report ANL-5990, Argonne National Laboratory, USAGoogle Scholar
  8. 8.
    De Leeuw J, Heiser WJ (1980) Multidimensional scaling with restrictions on the configuration. In Multivariate Analysis, Volume V, Krishnaiah PR, North-Holland, Amsterdam, pp 501-522Google Scholar
  9. 9.
    de Souza PN, Silva J-N (2001) Berkeley Problems in Mathematics, 2nd ed. Springer, New YorkMATHGoogle Scholar
  10. 10.
    Dobson AJ (1990) An Introduction to Generalized Linear Models. Chapman & Hall, LondonMATHGoogle Scholar
  11. 11.
    Green PJ (1984) Iteratively reweighted least squares for maximum likelihood estimation and some robust and resistant alternatives (with discussion). J Roy Stat Soc B 46:149-192MATHGoogle Scholar
  12. 12.
    Householder AS (1975) The Theory of Matrices in Numerical Analysis. Dover, New YorkMATHGoogle Scholar
  13. 13.
    Jamshidian M, Jennrich RI (1995) Acceleration of the EM algorithm by using quasi-Newton methods. J Roy Stat Soc B 59:569-587CrossRefMathSciNetGoogle Scholar
  14. 14.
    Jamshidian M, Jennrich RI (1997) Quasi-Newton acceleration of the EM algorithm. J Roy Stat Soc B 59:569-587MATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Jennrich RI, Moore RH (1975) Maximum likelihood estimation by means of nonlinear least squares. Proceedings of the Statistical Computing Section: Amer Stat Assoc 57-65Google Scholar
  16. 16.
    Khalfan HF, Byrd RH, Schnabel RB (1993) A theoretical and experimental study of the symmetric rank-one update. SIAM J Optimization 3:1-24MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Kingman JFC (1993) Poisson Processes. Oxford University Press, OxfordMATHGoogle Scholar
  18. 18.
    Lange K (1995) A gradient algorithm locally equivalent to the EM algorithm. J Roy Stat Soc B 57:425-437MATHGoogle Scholar
  19. 19.
    Lange K (1995) A quasi-Newton acceleration of the EM algorithm. Statistica Sinica 5:1-18MATHMathSciNetGoogle Scholar
  20. 20.
    Lange K (2004) Optimization. Springer, New YorkMATHGoogle Scholar
  21. 21.
    Lehmann EL (1986) Testing Statistical Hypotheses, 2nd ed. Wiley, New YorkMATHGoogle Scholar
  22. 22.
    Magnus JR, Neudecker H (1988) Matrix Differential Calculus withApplications in Statistics and Econometrics. Wiley, New YorkGoogle Scholar
  23. 23.
    Narayanan A (1991) Algorithm AS 266: maximum likelihood estimation of the parameters of the Dirichlet distribution. Appl Stat 40:365-374CrossRefGoogle Scholar
  24. 24.
    Nelder JA, Wedderburn RWM (1972) Generalized linear models. J Roy Stat Soc A 135:370-384CrossRefGoogle Scholar
  25. 25.
    Rao CR (1973) Linear Statistical Inference and its Applications, 2nd ed. Wiley, New YorkMATHGoogle Scholar
  26. 26.
    Titterington DM, Smith AFM, Makov UE (1985) Statistical Analysis of Finite Mixture Distributions. Wiley, New YorkMATHGoogle Scholar
  27. 27.
    Whyte BM, Gold J, Dobson AJ, Cooper DA (1987) Epidemiology of acquired immunodeficiency syndrome in Australia. Med J Aust 147:65-69Google Scholar

Copyright information

© Springer New York 2010

Authors and Affiliations

  1. 1.Departments of Biomathematics, Human Genetics, and Statistics David Geffen School of MedicineUniversity of California, Los AngelesLos AngelesUSA

Personalised recommendations