Skip to main content

Overview of Maximum Likelihood Estimation

  • Chapter
Regression Modeling Strategies

Part of the book series: Springer Series in Statistics ((SSS))

Abstract

In ordinary least squares multiple regression, the objective in fitting a model is to find the values of the unknown parameters that minimize the sum of squared errors of prediction. When the response variable is non-normal, polytomous, or not observed completely, one needs a more general objective function to optimize.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In linear regression, a t distribution is used to penalize for the fact that the variance of Y | X is estimated. In models such as the logistic model, there is no separate variance parameter to estimate. Gould has done simulations that show that the normal distribution provides more accurate P-values than the t for binary logistic regression.

  2. 2.

    For example, in a 3-treatment comparison one could examine contrasts between treatments A and B, A and C, and B and C by obtaining predicted values for those treatments, even though only two differences are required.

  3. 3.

    The rms command could be contrast(fit, list(sex=’male’,age=30), list(sex=’female’,age=40)) where all other predictors are set to medians or modes.

  4. 4.

    This is the basis for confidence limits computed by the R rms package’s Predict , summary , and contrast functions. When the robcov function has been used to replace the information-matrix-based covariance matrix with a Huber robust covariance estimate with an optional cluster sampling correction, the functions are using a “robust” Wald statistic basis. When the bootcov function has been used to replace the model fit’s covariance matrix with a bootstrap unconditional covariance matrix estimate, the two functions are computing confidence limits based on a normal distribution but using more nonparametric covariance estimates.

  5. 5.

    As indicated below, this standard deviation can also be obtained by using the summary function on the object returned by bootcov , as bootcov returns a fit object like one from lrm except with the bootstrap covariance matrix substituted for the information-based one.

  6. 6.

    Limited simulations using the conditional bootstrap and Firth’s penalized likelihood 281 did not show significant improvement in confidence interval coverage.

  7. 7.

    Several examples from simulated datasets have shown that using BIC to choose a penalty results in far too much shrinkage.

References

  1. O. O. Al-Radi, F. E. Harrell, C. A. Caldarone, B. W. McCrindle, J. P. Jacobs, M. G. Williams, G. S. Van Arsdell, and W. G. Williams. Case complexity scores in congenital heart surgery: A comparative study of the Aristotal Basic Complexity score and the Risk Adjustment in Congenital Heart Surg (RACHS-1) system. J Thorac Cardiovasc Surg, 133:865–874, 2007.

    Article  Google Scholar 

  2. J. M. Alho. On the computation of likelihood ratio and score test based confidence intervals in generalized linear models. Stat Med, 11:923–930, 1992.

    Article  Google Scholar 

  3. A. C. Atkinson. A note on the generalized information criterion for choice of a model. Biometrika, 67:413–418, 1980.

    Article  MATH  Google Scholar 

  4. D. A. Binder. Fitting Cox’s proportional hazards models from survey data. Biometrika, 79:139–147, 1992.

    Article  MathSciNet  Google Scholar 

  5. D. D. Boos. On generalized score tests. Ann Math Stat, 46:327–333, 1992.

    Google Scholar 

  6. A. R. Brazzale and A. C. Davison. Accurate parametric inference for small samples. Statistical Sci, 23(4):465–484, 2008.

    Article  MathSciNet  Google Scholar 

  7. L. Breiman. The little bootstrap and other methods for dimensionality selection in regression: X-fixed prediction error. J Am Stat Assoc, 87:738–754, 1992.

    Article  MathSciNet  Google Scholar 

  8. S. T. Buckland, K. P. Burnham, and N. H. Augustin. Model selection: An integral part of inference. Biometrics, 53:603–618, 1997.

    Article  MATH  Google Scholar 

  9. R. M. Califf, H. R. Phillips, and Others. Prognostic value of a coronary artery jeopardy score. J Am College Cardiol, 5:1055–1063, 1985.

    Google Scholar 

  10. J. Carpenter and J. Bithell. Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat Med, 19:1141–1164, 2000.

    Article  Google Scholar 

  11. L. E. Chambless and K. E. Boyle. Maximum likelihood methods for complex sample data: Logistic regression and discrete proportional hazards models. Comm Stat A, 14:1377–1392, 1985.

    Article  MATH  Google Scholar 

  12. C. Chatfield. Model uncertainty, data mining and statistical inference (with discussion). J Roy Stat Soc A, 158:419–466, 1995.

    Article  Google Scholar 

  13. D. Collett. Modelling Binary Data. Chapman and Hall, London, second edition, 2002.

    Google Scholar 

  14. D. R. Cox. Further results on tests of separate families of hypotheses. J Roy Stat Soc B, 24:406–424, 1962.

    MATH  Google Scholar 

  15. D. R. Cox. Regression models and life-tables (with discussion). J Roy Stat Soc B, 34:187–220, 1972.

    MATH  Google Scholar 

  16. D. R. Cox and E. J. Snell. The Analysis of Binary Data. Chapman and Hall, London, second edition, 1989.

    Google Scholar 

  17. D. R. Cox and N. Wermuth. A comment on the coefficient of determination for binary responses. Am Statistician, 46:1–4, 1992.

    Google Scholar 

  18. J. G. Cragg and R. Uhler. The demand for automobiles. Canadian Journal of Economics, 3:386–406, 1970.

    Article  MATH  Google Scholar 

  19. T. DiCiccio and B. Efron. More accurate confidence intervals in exponential families. Biometrika, 79:231–245, 1992.

    Article  MathSciNet  MATH  Google Scholar 

  20. N. Doganaksoy and J. Schmee. Comparisons of approximate confidence intervals for distributions used in life-data analysis. Technometrics, 35:175–184, 1993.

    Article  Google Scholar 

  21. M. Drum and P. McCullagh. Comment on regression models for discrete longitudinal responses by G. M. Fitzmaurice, N. M. Laird, and A. G. Rotnitzky. Stat Sci, 8:300–301, 1993.

    Google Scholar 

  22. B. Efron and R. Tibshirani. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Sci, 1:54–77, 1986.

    Article  MathSciNet  Google Scholar 

  23. B. Efron and R. Tibshirani. An Introduction to the Bootstrap. Chapman and Hall, New York, 1993.

    Book  Google Scholar 

  24. Z. Feng, D. McLerran, and J. Grizzle. A comparison of statistical methods for clustered data analysis with Gaussian error. Stat Med, 15:1793–1806, 1996.

    Article  MATH  Google Scholar 

  25. G. M. Fitzmaurice. A caveat concerning independence estimating equations with multivariate binary data. Biometrics, 51:309–317, 1995.

    Article  MATH  Google Scholar 

  26. Fox, John. Bootstrapping Regression Models: An Appendix to An R and S-PLUS Companion to Applied Regression, 2002.

    Google Scholar 

  27. D. A. Freedman. On the so-called “Huber sandwich estimator” and “robust standard errors”. Am Statistician, 60:299–302, 2006.

    Article  Google Scholar 

  28. J. H. Friedman. A variable span smoother. Technical Report 5, Laboratory for Computational Statistics, Department of Statistics, Stanford University, 1984.

    Google Scholar 

  29. R. Goldstein. The comparison of models in discrimination cases. Jurimetrics J, 34:215–234, 1994.

    Google Scholar 

  30. W. Gould. Confidence intervals in logit and probit models. Stata Tech Bull, STB-14:26–28, July 1993. http://www.stata.com/products/stb/journals/stb14.pdf.

  31. B. I. Graubard and E. L. Korn. Regression analysis with clustered data. Stat Med, 13:509–522, 1994.

    Article  Google Scholar 

  32. R. J. Gray. Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. J Am Stat Assoc, 87:942–951, 1992.

    Article  Google Scholar 

  33. S. Greenland. When should epidemiologic regressions use random coefficients? Biometrics, 56:915–921, 2000.

    Article  MATH  Google Scholar 

  34. F. E. Harrell and K. L. Lee. A comparison of the discrimination of discriminant analysis and logistic regression under multivariate normality. In P. K. Sen, editor, Biostatistics: Statistics in Biomedical, Public Health, and Environmental Sciences. The Bernard G. Greenberg Volume, pages 333–343. North-Holland, Amsterdam, 1985.

    Google Scholar 

  35. W. W. Hauck and A. Donner. Wald’s test as applied to hypotheses in logit analysis. J Am Stat Assoc, 72:851–863, 1977.

    MathSciNet  MATH  Google Scholar 

  36. G. Heinze and M. Schemper. A solution to the problem of separation in logistic regression. Stat Med, 21(16):2409–2419, 2002.

    Article  Google Scholar 

  37. T. Hothorn, F. Bretz, and P. Westfall. Simultaneous inference in general parametric models. Biometrical J, 50(3):346–363, 2008.

    Article  MathSciNet  Google Scholar 

  38. J. Huang and D. Harrington. Penalized partial likelihood regression for right-censored data with bootstrap selection of the penalty parameter. Biometrics, 58:781–791, 2002.

    Article  MathSciNet  MATH  Google Scholar 

  39. P. J. Huber. The behavior of maximum likelihood estimates under nonstandard conditions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1: Statistics, pages 221–233. University of California Press, Berkeley, CA, 1967.

    Google Scholar 

  40. C. M. Hurvich and C. Tsai. Regression and time series model selection in small samples. Biometrika, 76:297–307, 1989.

    Article  MathSciNet  MATH  Google Scholar 

  41. C. M. Hurvich and C. Tsai. Model selection for extended quasi-likelihood models in small samples. Biometrics, 51:1077–1084, 1995.

    Article  MATH  Google Scholar 

  42. R. E. Kass and A. E. Raftery. Bayes factors. J Am Stat Assoc, 90:773–795, 1995.

    Article  MATH  Google Scholar 

  43. S. Konishi and G. Kitagawa. Information Criteria and Statistical Modeling. Springer, New York, 2008. ISBN 978-0-387-71886-6.

    Book  MATH  Google Scholar 

  44. E. L. Korn and B. I. Graubard. Analysis of large health surveys: Accounting for the sampling design. J Roy Stat Soc A, 158:263–295, 1995.

    Article  Google Scholar 

  45. E. L. Korn and B. I. Graubard. Examples of differing weighted and unweighted estimates from a sample survey. Am Statistician, 49:291–295, 1995.

    Google Scholar 

  46. E. L. Korn and R. Simon. Measures of explained variation for survival data. Stat Med, 9:487–503, 1990.

    Article  Google Scholar 

  47. E. L. Korn and R. Simon. Explained residual variation, explained risk, and goodness of fit. Am Statistician, 45:201–206, 1991.

    Google Scholar 

  48. T. P. Lane and W. H. DuMouchel. Simultaneous confidence intervals in multiple regression. Am Statistician, 48:315–321, 1994.

    Google Scholar 

  49. P. W. Laud and J. G. Ibrahim. Predictive model selection. J Roy Stat Soc B, 57:247–262, 1995.

    MathSciNet  MATH  Google Scholar 

  50. S. le Cessie and J. C. van Houwelingen. Ridge estimators in logistic regression. Appl Stat, 41:191–201, 1992.

    Article  Google Scholar 

  51. E. W. Lee, L. J. Wei, and D. A. Amato. Cox-type regression analysis for large numbers of small groups of correlated failure time observations. In J. P. Klein and P. K. Goel, editors, Survival Analysis: State of the Art, NATO ASI, pages 237–247. Kluwer Academic, Boston, 1992.

    Chapter  Google Scholar 

  52. K. L. Lee, D. B. Pryor, F. E. Harrell, R. M. Califf, V. S. Behar, W. L. Floyd, J. J. Morris, R. A. Waugh, R. E. Whalen, and R. A. Rosati. Predicting outcome in coronary disease: Statistical models versus expert clinicians. Am J Med, 80:553–560, 1986.

    Article  Google Scholar 

  53. D. Y. Lin. Cox regression analysis of multivariate failure time data: The marginal approach. Stat Med, 13:2233–2247, 1994.

    Article  Google Scholar 

  54. D. Y. Lin. On fitting Cox’s proportional hazards models to survey data. Biometrika, 87:37–47, 2000.

    Article  MathSciNet  MATH  Google Scholar 

  55. D. Y. Lin and L. J. Wei. The robust inference for the Cox proportional hazards model. J Am Stat Assoc, 84:1074–1078, 1989.

    Article  MathSciNet  MATH  Google Scholar 

  56. K. Liu and A. R. Dyer. A rank statistic for assessing the amount of variation explained by risk factors in epidemiologic studies. Am J Epi, 109:597–606, 1979.

    Google Scholar 

  57. J. S. Long and L. H. Ervin. Using heteroscedasticity consistent standard errors in the linear regression model. Am Statistician, 54:217–224, 2000.

    Google Scholar 

  58. G. S. Maddala. Limited-Dependent and Qualitative Variables in Econometrics. Cambridge University Press, Cambridge, UK, 1983.

    Book  MATH  Google Scholar 

  59. L. Magee. R 2 measures based on Wald and likelihood ratio joint significance tests. Am Statistician, 44:250–253, 1990.

    Google Scholar 

  60. E. Marubini and M. G. Valsecchi. Analyzing Survival Data from Clinical Trials and Observational Studies. Wiley, Chichester, 1995.

    Google Scholar 

  61. W. Q. Meeker and L. A. Escobar. Teaching about approximate confidence regions based on maximum likelihood estimation. Am Statistician, 49:48–53, 1995.

    MATH  Google Scholar 

  62. S. Menard. Coefficients of determination for multiple logistic regression analysis. Am Statistician, 54:17–24, 2000.

    Google Scholar 

  63. S. Minkin. Profile-likelihood-based confidence intervals. Appl Stat, 39:125–126, 1990.

    MATH  Google Scholar 

  64. M. Mittlböck and M. Schemper. Explained variation for logistic regression. Stat Med, 15:1987–1997, 1996.

    Article  Google Scholar 

  65. K. G. M. Moons, Donders, E. W. Steyerberg, and F. E. Harrell. Penalized maximum likelihood estimation to directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example. J Clin Epi, 57:1262–1270, 2004.

    Google Scholar 

  66. B. J. T. Morgan, K. J. Palmer, and M. S. Ridout. Negative score test statistic (with discussion). Am Statistician, 61(4):285–295, 2007.

    Article  MathSciNet  Google Scholar 

  67. N. J. D. Nagelkerke. A note on a general definition of the coefficient of determination. Biometrika, 78:691–692, 1991.

    Article  MathSciNet  MATH  Google Scholar 

  68. M. Y. Park and T. Hastie. Penalized logistic regression for detecting gene interactions. Biostat, 9(1):30–50, 2008.

    Article  MATH  Google Scholar 

  69. L. W. Pickle. Maximum likelihood estimation in the new computing environment. Stat Comp Graphics News ASA, 2(2):6–15, Nov. 1991.

    Google Scholar 

  70. W. H. Rogers. Regression standard errors in clustered samples. Stata Tech Bull, STB-13:19–23, May 1993. http://www.stata.com/products/stb/journals/stb13.pdf.

  71. P. Royston and S. G. Thompson. Comparing non-nested regression models. Biometrics, 51:114–127, 1995.

    Article  MATH  Google Scholar 

  72. S. Sardy. On the practice of rescaling covariates. Int Stat Rev, 76:285–297, 2008.

    Article  Google Scholar 

  73. M. Schemper. The relative importance of prognostic factors in studies of survival. Stat Med, 12:2377–2382, 1993.

    Article  Google Scholar 

  74. M. Schemper and J. Stare. Explained variation in survival analysis. Stat Med, 15:1999–2012, 1996.

    Article  Google Scholar 

  75. G. Schwarz. Estimating the dimension of a model. Ann Stat, 6:461–464, 1978.

    Article  MATH  Google Scholar 

  76. A. F. M. Smith and D. J. Spiegelhalter. Bayes factors and choice criteria for linear models. J Roy Stat Soc B, 42:213–220, 1980.

    MathSciNet  MATH  Google Scholar 

  77. T. M. Therneau, P. M. Grambsch, and T. R. Fleming. Martingale-based residuals for survival models. Biometrika, 77:216–218, 1990.

    Article  MathSciNet  Google Scholar 

  78. R. Tibshirani. Regression shrinkage and selection via the lasso. J Roy Stat Soc B, 58:267–288, 1996.

    MathSciNet  Google Scholar 

  79. R. Tibshirani and K. Knight. Model search and inference by bootstrap “bumping”. Technical report, Department of Statistics, University of Toronto, 1997. http://www-stat.stanford.edu/tibs. Presented at the Joint Statistical Meetings, Chicago, August 1996.

  80. H. C. van Houwelingen and J. Thorogood. Construction, validation and updating of a prognostic model for kidney graft survival. Stat Med, 14:1999–2008, 1995.

    Article  Google Scholar 

  81. J. C. van Houwelingen and S. le Cessie. Predictive value of statistical models. Stat Med, 9:1303–1325, 1990.

    Article  Google Scholar 

  82. D. J. Venzon and S. H. Moolgavkar. A method for computing profile-likelihood-based confidence intervals. Appl Stat, 37:87–94, 1988.

    Article  Google Scholar 

  83. P. Verweij and H. C. van Houwelingen. Penalized likelihood in Cox regression. Stat Med, 13:2427–2436, 1994.

    Article  Google Scholar 

  84. P. J. M. Verweij and H. C. van Houwelingen. Cross-validation in survival analysis. Stat Med, 12:2305–2314, 1993.

    Article  Google Scholar 

  85. P. J. M. Verweij and H. C. van Houwelingen. Time-dependent effects of fixed covariates in Cox regression. Biometrics, 51:1550–1556, 1995.

    Article  MATH  Google Scholar 

  86. Y. Wang and J. M. G. Taylor. Inference for smooth curves in longitudinal data with application to an AIDS clinical trial. Stat Med, 14:1205–1218, 1995.

    Article  Google Scholar 

  87. H. White. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48:817–838, 1980.

    Article  MathSciNet  MATH  Google Scholar 

  88. J. Whittaker. Model interpretation from the additive elements of the likelihood function. Appl Stat, 33:52–64, 1984.

    Article  MATH  Google Scholar 

  89. A. R. Willan, W. Ross, and T. A. MacKenzie. Comparing in-patient classification systems: A problem of non-nested regression models. Stat Med, 11:1321–1331, 1992.

    Article  Google Scholar 

  90. Y. Xiao and M. Abrahamowicz. Bootstrap-based methods for estimating standard errors in Cox’s regression analyses of clustered event times. Stat Med, 29:915–923, 2010.

    Article  MathSciNet  Google Scholar 

  91. B. Zheng and A. Agresti. Summarizing the predictive power of a generalized linear model. Stat Med, 19:1771–1781, 2000.

    Article  Google Scholar 

  92. X. Zheng and W. Loh. Consistent variable selection in linear models. J Am Stat Assoc, 90:151–156, 1995.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Harrell, F.E. (2015). Overview of Maximum Likelihood Estimation. In: Regression Modeling Strategies. Springer Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-19425-7_9

Download citation

Publish with us

Policies and ethics