Overview of Maximum Likelihood Estimation

Harrell, Frank E.

doi:10.1007/978-3-319-19425-7_9

Frank E. Harrell Jr.⁸

Part of the book series: Springer Series in Statistics ((SSS))

205k Accesses
5 Citations

Abstract

In ordinary least squares multiple regression, the objective in fitting a model is to find the values of the unknown parameters that minimize the sum of squared errors of prediction. When the response variable is non-normal, polytomous, or not observed completely, one needs a more general objective function to optimize.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In linear regression, a t distribution is used to penalize for the fact that the variance of Y | X is estimated. In models such as the logistic model, there is no separate variance parameter to estimate. Gould has done simulations that show that the normal distribution provides more accurate P-values than the t for binary logistic regression.
2.
For example, in a 3-treatment comparison one could examine contrasts between treatments A and B, A and C, and B and C by obtaining predicted values for those treatments, even though only two differences are required.
3.
The rms command could be contrast(fit, list(sex=’male’,age=30), list(sex=’female’,age=40)) where all other predictors are set to medians or modes.
4.
This is the basis for confidence limits computed by the R rms package’s Predict , summary , and contrast functions. When the robcov function has been used to replace the information-matrix-based covariance matrix with a Huber robust covariance estimate with an optional cluster sampling correction, the functions are using a “robust” Wald statistic basis. When the bootcov function has been used to replace the model fit’s covariance matrix with a bootstrap unconditional covariance matrix estimate, the two functions are computing confidence limits based on a normal distribution but using more nonparametric covariance estimates.
5.
As indicated below, this standard deviation can also be obtained by using the summary function on the object returned by bootcov , as bootcov returns a fit object like one from lrm except with the bootstrap covariance matrix substituted for the information-based one.
6.
Limited simulations using the conditional bootstrap and Firth’s penalized likelihood 281 did not show significant improvement in confidence interval coverage.
7.
Several examples from simulated datasets have shown that using BIC to choose a penalty results in far too much shrinkage.

References

O. O. Al-Radi, F. E. Harrell, C. A. Caldarone, B. W. McCrindle, J. P. Jacobs, M. G. Williams, G. S. Van Arsdell, and W. G. Williams. Case complexity scores in congenital heart surgery: A comparative study of the Aristotal Basic Complexity score and the Risk Adjustment in Congenital Heart Surg (RACHS-1) system. J Thorac Cardiovasc Surg, 133:865–874, 2007.
Article Google Scholar
J. M. Alho. On the computation of likelihood ratio and score test based confidence intervals in generalized linear models. Stat Med, 11:923–930, 1992.
Article Google Scholar
A. C. Atkinson. A note on the generalized information criterion for choice of a model. Biometrika, 67:413–418, 1980.
Article MATH Google Scholar
D. A. Binder. Fitting Cox’s proportional hazards models from survey data. Biometrika, 79:139–147, 1992.
Article MathSciNet Google Scholar
D. D. Boos. On generalized score tests. Ann Math Stat, 46:327–333, 1992.
Google Scholar
A. R. Brazzale and A. C. Davison. Accurate parametric inference for small samples. Statistical Sci, 23(4):465–484, 2008.
Article MathSciNet Google Scholar
L. Breiman. The little bootstrap and other methods for dimensionality selection in regression: X-fixed prediction error. J Am Stat Assoc, 87:738–754, 1992.
Article MathSciNet Google Scholar
S. T. Buckland, K. P. Burnham, and N. H. Augustin. Model selection: An integral part of inference. Biometrics, 53:603–618, 1997.
Article MATH Google Scholar
R. M. Califf, H. R. Phillips, and Others. Prognostic value of a coronary artery jeopardy score. J Am College Cardiol, 5:1055–1063, 1985.
Google Scholar
J. Carpenter and J. Bithell. Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat Med, 19:1141–1164, 2000.
Article Google Scholar
L. E. Chambless and K. E. Boyle. Maximum likelihood methods for complex sample data: Logistic regression and discrete proportional hazards models. Comm Stat A, 14:1377–1392, 1985.
Article MATH Google Scholar
C. Chatfield. Model uncertainty, data mining and statistical inference (with discussion). J Roy Stat Soc A, 158:419–466, 1995.
Article Google Scholar
D. Collett. Modelling Binary Data. Chapman and Hall, London, second edition, 2002.
Google Scholar
D. R. Cox. Further results on tests of separate families of hypotheses. J Roy Stat Soc B, 24:406–424, 1962.
MATH Google Scholar
D. R. Cox. Regression models and life-tables (with discussion). J Roy Stat Soc B, 34:187–220, 1972.
MATH Google Scholar
D. R. Cox and E. J. Snell. The Analysis of Binary Data. Chapman and Hall, London, second edition, 1989.
Google Scholar
D. R. Cox and N. Wermuth. A comment on the coefficient of determination for binary responses. Am Statistician, 46:1–4, 1992.
Google Scholar
J. G. Cragg and R. Uhler. The demand for automobiles. Canadian Journal of Economics, 3:386–406, 1970.
Article MATH Google Scholar
T. DiCiccio and B. Efron. More accurate confidence intervals in exponential families. Biometrika, 79:231–245, 1992.
Article MathSciNet MATH Google Scholar
N. Doganaksoy and J. Schmee. Comparisons of approximate confidence intervals for distributions used in life-data analysis. Technometrics, 35:175–184, 1993.
Article Google Scholar
M. Drum and P. McCullagh. Comment on regression models for discrete longitudinal responses by G. M. Fitzmaurice, N. M. Laird, and A. G. Rotnitzky. Stat Sci, 8:300–301, 1993.
Google Scholar
B. Efron and R. Tibshirani. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Sci, 1:54–77, 1986.
Article MathSciNet Google Scholar
B. Efron and R. Tibshirani. An Introduction to the Bootstrap. Chapman and Hall, New York, 1993.
Book Google Scholar
Z. Feng, D. McLerran, and J. Grizzle. A comparison of statistical methods for clustered data analysis with Gaussian error. Stat Med, 15:1793–1806, 1996.
Article MATH Google Scholar
G. M. Fitzmaurice. A caveat concerning independence estimating equations with multivariate binary data. Biometrics, 51:309–317, 1995.
Article MATH Google Scholar
Fox, John. Bootstrapping Regression Models: An Appendix to An R and S-PLUS Companion to Applied Regression, 2002.
Google Scholar
D. A. Freedman. On the so-called “Huber sandwich estimator” and “robust standard errors”. Am Statistician, 60:299–302, 2006.
Article Google Scholar
J. H. Friedman. A variable span smoother. Technical Report 5, Laboratory for Computational Statistics, Department of Statistics, Stanford University, 1984.
Google Scholar
R. Goldstein. The comparison of models in discrimination cases. Jurimetrics J, 34:215–234, 1994.
Google Scholar
W. Gould. Confidence intervals in logit and probit models. Stata Tech Bull, STB-14:26–28, July 1993. http://www.stata.com/products/stb/journals/stb14.pdf.
B. I. Graubard and E. L. Korn. Regression analysis with clustered data. Stat Med, 13:509–522, 1994.
Article Google Scholar
R. J. Gray. Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. J Am Stat Assoc, 87:942–951, 1992.
Article Google Scholar
S. Greenland. When should epidemiologic regressions use random coefficients? Biometrics, 56:915–921, 2000.
Article MATH Google Scholar
F. E. Harrell and K. L. Lee. A comparison of the discrimination of discriminant analysis and logistic regression under multivariate normality. In P. K. Sen, editor, Biostatistics: Statistics in Biomedical, Public Health, and Environmental Sciences. The Bernard G. Greenberg Volume, pages 333–343. North-Holland, Amsterdam, 1985.
Google Scholar
W. W. Hauck and A. Donner. Wald’s test as applied to hypotheses in logit analysis. J Am Stat Assoc, 72:851–863, 1977.
MathSciNet MATH Google Scholar
G. Heinze and M. Schemper. A solution to the problem of separation in logistic regression. Stat Med, 21(16):2409–2419, 2002.
Article Google Scholar
T. Hothorn, F. Bretz, and P. Westfall. Simultaneous inference in general parametric models. Biometrical J, 50(3):346–363, 2008.
Article MathSciNet Google Scholar
J. Huang and D. Harrington. Penalized partial likelihood regression for right-censored data with bootstrap selection of the penalty parameter. Biometrics, 58:781–791, 2002.
Article MathSciNet MATH Google Scholar
P. J. Huber. The behavior of maximum likelihood estimates under nonstandard conditions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1: Statistics, pages 221–233. University of California Press, Berkeley, CA, 1967.
Google Scholar
C. M. Hurvich and C. Tsai. Regression and time series model selection in small samples. Biometrika, 76:297–307, 1989.
Article MathSciNet MATH Google Scholar
C. M. Hurvich and C. Tsai. Model selection for extended quasi-likelihood models in small samples. Biometrics, 51:1077–1084, 1995.
Article MATH Google Scholar
R. E. Kass and A. E. Raftery. Bayes factors. J Am Stat Assoc, 90:773–795, 1995.
Article MATH Google Scholar
S. Konishi and G. Kitagawa. Information Criteria and Statistical Modeling. Springer, New York, 2008. ISBN 978-0-387-71886-6.
Book MATH Google Scholar
E. L. Korn and B. I. Graubard. Analysis of large health surveys: Accounting for the sampling design. J Roy Stat Soc A, 158:263–295, 1995.
Article Google Scholar
E. L. Korn and B. I. Graubard. Examples of differing weighted and unweighted estimates from a sample survey. Am Statistician, 49:291–295, 1995.
Google Scholar
E. L. Korn and R. Simon. Measures of explained variation for survival data. Stat Med, 9:487–503, 1990.
Article Google Scholar
E. L. Korn and R. Simon. Explained residual variation, explained risk, and goodness of fit. Am Statistician, 45:201–206, 1991.
Google Scholar
T. P. Lane and W. H. DuMouchel. Simultaneous confidence intervals in multiple regression. Am Statistician, 48:315–321, 1994.
Google Scholar
P. W. Laud and J. G. Ibrahim. Predictive model selection. J Roy Stat Soc B, 57:247–262, 1995.
MathSciNet MATH Google Scholar
S. le Cessie and J. C. van Houwelingen. Ridge estimators in logistic regression. Appl Stat, 41:191–201, 1992.
Article Google Scholar
E. W. Lee, L. J. Wei, and D. A. Amato. Cox-type regression analysis for large numbers of small groups of correlated failure time observations. In J. P. Klein and P. K. Goel, editors, Survival Analysis: State of the Art, NATO ASI, pages 237–247. Kluwer Academic, Boston, 1992.
Chapter Google Scholar
K. L. Lee, D. B. Pryor, F. E. Harrell, R. M. Califf, V. S. Behar, W. L. Floyd, J. J. Morris, R. A. Waugh, R. E. Whalen, and R. A. Rosati. Predicting outcome in coronary disease: Statistical models versus expert clinicians. Am J Med, 80:553–560, 1986.
Article Google Scholar
D. Y. Lin. Cox regression analysis of multivariate failure time data: The marginal approach. Stat Med, 13:2233–2247, 1994.
Article Google Scholar
D. Y. Lin. On fitting Cox’s proportional hazards models to survey data. Biometrika, 87:37–47, 2000.
Article MathSciNet MATH Google Scholar
D. Y. Lin and L. J. Wei. The robust inference for the Cox proportional hazards model. J Am Stat Assoc, 84:1074–1078, 1989.
Article MathSciNet MATH Google Scholar
K. Liu and A. R. Dyer. A rank statistic for assessing the amount of variation explained by risk factors in epidemiologic studies. Am J Epi, 109:597–606, 1979.
Google Scholar
J. S. Long and L. H. Ervin. Using heteroscedasticity consistent standard errors in the linear regression model. Am Statistician, 54:217–224, 2000.
Google Scholar
G. S. Maddala. Limited-Dependent and Qualitative Variables in Econometrics. Cambridge University Press, Cambridge, UK, 1983.
Book MATH Google Scholar
L. Magee. R ² measures based on Wald and likelihood ratio joint significance tests. Am Statistician, 44:250–253, 1990.
Google Scholar
E. Marubini and M. G. Valsecchi. Analyzing Survival Data from Clinical Trials and Observational Studies. Wiley, Chichester, 1995.
Google Scholar
W. Q. Meeker and L. A. Escobar. Teaching about approximate confidence regions based on maximum likelihood estimation. Am Statistician, 49:48–53, 1995.
MATH Google Scholar
S. Menard. Coefficients of determination for multiple logistic regression analysis. Am Statistician, 54:17–24, 2000.
Google Scholar
S. Minkin. Profile-likelihood-based confidence intervals. Appl Stat, 39:125–126, 1990.
MATH Google Scholar
M. Mittlböck and M. Schemper. Explained variation for logistic regression. Stat Med, 15:1987–1997, 1996.
Article Google Scholar
K. G. M. Moons, Donders, E. W. Steyerberg, and F. E. Harrell. Penalized maximum likelihood estimation to directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example. J Clin Epi, 57:1262–1270, 2004.
Google Scholar
B. J. T. Morgan, K. J. Palmer, and M. S. Ridout. Negative score test statistic (with discussion). Am Statistician, 61(4):285–295, 2007.
Article MathSciNet Google Scholar
N. J. D. Nagelkerke. A note on a general definition of the coefficient of determination. Biometrika, 78:691–692, 1991.
Article MathSciNet MATH Google Scholar
M. Y. Park and T. Hastie. Penalized logistic regression for detecting gene interactions. Biostat, 9(1):30–50, 2008.
Article MATH Google Scholar
L. W. Pickle. Maximum likelihood estimation in the new computing environment. Stat Comp Graphics News ASA, 2(2):6–15, Nov. 1991.
Google Scholar
W. H. Rogers. Regression standard errors in clustered samples. Stata Tech Bull, STB-13:19–23, May 1993. http://www.stata.com/products/stb/journals/stb13.pdf.
P. Royston and S. G. Thompson. Comparing non-nested regression models. Biometrics, 51:114–127, 1995.
Article MATH Google Scholar
S. Sardy. On the practice of rescaling covariates. Int Stat Rev, 76:285–297, 2008.
Article Google Scholar
M. Schemper. The relative importance of prognostic factors in studies of survival. Stat Med, 12:2377–2382, 1993.
Article Google Scholar
M. Schemper and J. Stare. Explained variation in survival analysis. Stat Med, 15:1999–2012, 1996.
Article Google Scholar
G. Schwarz. Estimating the dimension of a model. Ann Stat, 6:461–464, 1978.
Article MATH Google Scholar
A. F. M. Smith and D. J. Spiegelhalter. Bayes factors and choice criteria for linear models. J Roy Stat Soc B, 42:213–220, 1980.
MathSciNet MATH Google Scholar
T. M. Therneau, P. M. Grambsch, and T. R. Fleming. Martingale-based residuals for survival models. Biometrika, 77:216–218, 1990.
Article MathSciNet Google Scholar
R. Tibshirani. Regression shrinkage and selection via the lasso. J Roy Stat Soc B, 58:267–288, 1996.
MathSciNet Google Scholar
R. Tibshirani and K. Knight. Model search and inference by bootstrap “bumping”. Technical report, Department of Statistics, University of Toronto, 1997. http://www-stat.stanford.edu/tibs. Presented at the Joint Statistical Meetings, Chicago, August 1996.
H. C. van Houwelingen and J. Thorogood. Construction, validation and updating of a prognostic model for kidney graft survival. Stat Med, 14:1999–2008, 1995.
Article Google Scholar
J. C. van Houwelingen and S. le Cessie. Predictive value of statistical models. Stat Med, 9:1303–1325, 1990.
Article Google Scholar
D. J. Venzon and S. H. Moolgavkar. A method for computing profile-likelihood-based confidence intervals. Appl Stat, 37:87–94, 1988.
Article Google Scholar
P. Verweij and H. C. van Houwelingen. Penalized likelihood in Cox regression. Stat Med, 13:2427–2436, 1994.
Article Google Scholar
P. J. M. Verweij and H. C. van Houwelingen. Cross-validation in survival analysis. Stat Med, 12:2305–2314, 1993.
Article Google Scholar
P. J. M. Verweij and H. C. van Houwelingen. Time-dependent effects of fixed covariates in Cox regression. Biometrics, 51:1550–1556, 1995.
Article MATH Google Scholar
Y. Wang and J. M. G. Taylor. Inference for smooth curves in longitudinal data with application to an AIDS clinical trial. Stat Med, 14:1205–1218, 1995.
Article Google Scholar
H. White. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48:817–838, 1980.
Article MathSciNet MATH Google Scholar
J. Whittaker. Model interpretation from the additive elements of the likelihood function. Appl Stat, 33:52–64, 1984.
Article MATH Google Scholar
A. R. Willan, W. Ross, and T. A. MacKenzie. Comparing in-patient classification systems: A problem of non-nested regression models. Stat Med, 11:1321–1331, 1992.
Article Google Scholar
Y. Xiao and M. Abrahamowicz. Bootstrap-based methods for estimating standard errors in Cox’s regression analyses of clustered event times. Stat Med, 29:915–923, 2010.
Article MathSciNet Google Scholar
B. Zheng and A. Agresti. Summarizing the predictive power of a generalized linear model. Stat Med, 19:1771–1781, 2000.
Article Google Scholar
X. Zheng and W. Loh. Consistent variable selection in linear models. J Am Stat Assoc, 90:151–156, 1995.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biostatistics, School of Medicine Vanderbilt University, Nashville, TN, USA
Frank E. Harrell Jr.

Authors

Frank E. Harrell Jr.
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Harrell, F.E. (2015). Overview of Maximum Likelihood Estimation. In: Regression Modeling Strategies. Springer Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-19425-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-19425-7_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19424-0
Online ISBN: 978-3-319-19425-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics