Binary Logistic Regression

  • Frank E. HarrellJr.
Part of the Springer Series in Statistics book series (SSS)


Binary responses are commonly studied in many fields. Examples include 1 the presence or absence of a particular disease, death during surgery, or a consumer purchasing a product. Often one wishes to study how a set of predictor variables X is related to a dichotomous response variable Y. The predictors may describe such quantities as treatment assignment, dosage, risk factors, and calendar time. For convenience we define the response to be Y = 0 or 1, with Y = 1 denoting the occurrence of the event of interest. Often a dichotomous outcome can be studied by calculating certain proportions, for example, the proportion of deaths among females and the proportion among males. However, in many situations, there are multiple descriptors, or one or more of the descriptors are continuous. Without a statistical model, studying patterns such as the relationship between age and occurrence of a disease, for example, would require the creation of arbitrary age groups to allow estimation of disease prevalence as a function of age.


Logistic Model Binary Logistic Regression Spline Function Wald Statistic Brier Score 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 6.
    A. Agresti. Categorical data analysis. Wiley, Hoboken, NJ, second edition, 2002.CrossRefzbMATHGoogle Scholar
  2. 30.
    H. R. Arkes, N. V. Dawson, T. Speroff, F. E. Harrell, C. Alzola, R. Phillips, N. Desbiens, R. K. Oye, W. Knaus, A. F. Connors, and T. Investigators. The covariance decomposition of the probability score and its use in evaluating prognostic estimates. Med Decis Mak, 15:120–131, 1995.CrossRefGoogle Scholar
  3. 39.
    D. Bamber. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J Mathe Psych, 12:387–415, 1975.MathSciNetCrossRefzbMATHGoogle Scholar
  4. 40.
    J. Banks. Nomograms. In S. Kotz and N. L. Johnson, editors, Encyclopedia of Stat Scis, volume 6. Wiley, New York, 1985.Google Scholar
  5. 51.
    K. N. Berk and D. E. Booth. Seeing a curve in multiple regression. Technometrics, 37:385–398, 1995.CrossRefzbMATHGoogle Scholar
  6. 73.
    G. W. Brier. Verification of forecasts expressed in terms of probability. Monthly Weather Rev, 78:1–3, 1950.CrossRefGoogle Scholar
  7. 86.
    M. Buyse. R 2: A useful measure of model performance when predicting a dichotomous outcome. Stat Med, 19:271–274, 2000. Letter to the Editor regarding Stat Med 18:375–384; 1999.Google Scholar
  8. 95.
    M. S. Cepeda, R. Boston, J. T. Farrar, and B. L. Strom. Comparison of logistic regression versus propensity score when the number of events is low and there are multiple confounders. Am J Epi, 158:280–287, 2003.CrossRefGoogle Scholar
  9. 96.
    J. M. Chambers and T. J. Hastie, editors. Statistical Models in S. Wadsworth and Brooks/Cole, Pacific Grove, CA, 1992.zbMATHGoogle Scholar
  10. 111.
    W. S. Cleveland. Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc, 74:829–836, 1979.MathSciNetCrossRefzbMATHGoogle Scholar
  11. 115.
    D. Collett. Modelling Binary Data. Chapman and Hall, London, second edition, 2002.zbMATHGoogle Scholar
  12. 117.
    E. F. Cook and L. Goldman. Asymmetric stratification: An outline for an efficient method for controlling confounding in cohort studies. Am J Epi, 127:626–639, 1988.Google Scholar
  13. 118.
    N. R. Cook. Use and misues of the receiver operating characteristic curve in risk prediction. Circulation, 115:928–935, 2007.CrossRefGoogle Scholar
  14. 121.
    J. Copas. The effectiveness of risk scores: The logit rank plot. Appl Stat, 48:165–183, 1999.zbMATHGoogle Scholar
  15. 123.
    J. B. Copas. Cross-validation shrinkage of regression predictors. J Roy Stat Soc B, 49:175–183, 1987.MathSciNetzbMATHGoogle Scholar
  16. 124.
    J. B. Copas. Unweighted sum of squares tests for proportions. Appl Stat, 38:71–80, 1989.MathSciNetCrossRefGoogle Scholar
  17. 129.
    D. R. Cox. The regression analysis of binary sequences (with discussion). J Roy Stat Soc B, 20:215–242, 1958.zbMATHGoogle Scholar
  18. 130.
    D. R. Cox. Two further applications of a model for binary regression. Biometrika, 45(3/4):562–565, 1958.CrossRefzbMATHGoogle Scholar
  19. 136.
    D. R. Cox and N. Wermuth. A comment on the coefficient of determination for binary responses. Am Statistician, 46:1–4, 1992.Google Scholar
  20. 137.
    J. G. Cragg and R. Uhler. The demand for automobiles. Canadian Journal of Economics, 3:386–406, 1970.CrossRefGoogle Scholar
  21. 147.
    C. E. Davis, J. E. Hyde, S. I. Bangdiwala, and J. J. Nelson. An example of dependencies among variables in a conditional logistic regression. In S. H. Moolgavkar and R. L. Prentice, editors, Modern Statistical Methods in Chronic Disease Epi, pages 140–147. Wiley, New York, 1986.Google Scholar
  22. 172.
    B. Efron. Estimating the error rate of a prediction rule: Improvement on cross-validation. J Am Stat Assoc, 78:316–331, 1983.MathSciNetCrossRefzbMATHGoogle Scholar
  23. 199.
    E. B. Fowlkes. Some diagnostics for binary logistic regression via smoothing. Biometrika, 74:503–515, 1987.MathSciNetCrossRefGoogle Scholar
  24. 207.
    J. H. Friedman. A variable span smoother. Technical Report 5, Laboratory for Computational Statistics, Department of Statistics, Stanford University, 1984.Google Scholar
  25. 219.
    T. Gneiting and A. E. Raftery. Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc, 102:359–378, 2007.MathSciNetCrossRefzbMATHGoogle Scholar
  26. 251.
    M. Halperin, W. C. Blackwelder, and J. I. Verter. Estimation of the multivariate logistic risk function: A comparison of the discriminant function and maximum likelihood approaches. J Chron Dis, 24:125–158, 1971.CrossRefzbMATHGoogle Scholar
  27. 253.
    D. J. Hand. Construction and Assessment of Classification Rules. Wiley, Chichester, 1997.zbMATHGoogle Scholar
  28. 254.
    T. L. Hankins. Blood, dirt, and nomograms. Chance, 13(1):26–37, 2000.CrossRefGoogle Scholar
  29. 255.
    J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143:29–36, 1982.CrossRefGoogle Scholar
  30. 259.
    F. E. Harrell. Comparison of strategies for validating binary logistic regression models. Unpublished manuscript, 1991.Google Scholar
  31. 264.
    F. E. Harrell and K. L. Lee. A comparison of the discrimination of discriminant analysis and logistic regression under multivariate normality. In P. K. Sen, editor, Biostatistics: Statistics in Biomedical, Public Health, and Environmental Sciences. The Bernard G. Greenberg Volume, pages 333–343. North-Holland, Amsterdam, 1985.Google Scholar
  32. 265.
    F. E. Harrell and K. L. Lee. The practical value of logistic regression. In Proceedings of the Tenth Annual SAS Users Group International Conference, pages 1031–1036, 1985.Google Scholar
  33. 267.
    F. E. Harrell and K. L. Lee. Using logistic model calibration to assess the quality of probability predictions. Unpublished manuscript, 1987.Google Scholar
  34. 278.
    W. W. Hauck and A. Donner. Wald’s test as applied to hypotheses in logit analysis. J Am Stat Assoc, 72:851–863, 1977.MathSciNetzbMATHGoogle Scholar
  35. 284.
    A. V. Hernández, M. J. Eijkemans, and E. W. Steyerberg. Randomized controlled trials with time-to-event outcomes: how much does prespecified covariate adjustment increase power? Annals of epidemiology, 16(1):41–48, Jan. 2006.Google Scholar
  36. 285.
    A. V. Hernández, E. W. Steyerberg, and J. D. F. Habbema. Covariate adjustment in randomized controlled trials with dichotomous outcomes increases statistical power and reduces sample size requirements. J Clin Epi, 57:454–460, 2004.CrossRefzbMATHGoogle Scholar
  37. 303.
    D. W. Hosmer, T. Hosmer, S. le Cessie, and S. Lemeshow. A comparison of goodness-of-fit tests for the logistic regression model. Stat Med, 16:965–980, 1997.CrossRefGoogle Scholar
  38. 304.
    D. W. Hosmer and S. Lemeshow. Goodness-of-fit tests for the multiple logistic regression model. Comm Stat Th Meth, 9:1043–1069, 1980.CrossRefGoogle Scholar
  39. 305.
    D. W. Hosmer and S. Lemeshow. Applied Logistic Regression. Wiley, New York, 1989.Google Scholar
  40. 306.
    D. W. Hosmer and S. Lemeshow. Confidence interval estimates of an index of quality performance based on logistic regression models. Stat Med, 14:2161–2172, 1995. See letter to editor 16:1301-3,1997.Google Scholar
  41. 309.
    B. Hu, M. Palta, and J. Shao. Properties of R 2 statistics for logistic regression. Stat Med, 25:1383–1395, 2006.MathSciNetCrossRefGoogle Scholar
  42. 341.
    R. Kay and S. Little. Assessing the fit of the logistic model: A case study of children with the haemolytic uraemic syndrome. Appl Stat, 35:16–30, 1986.CrossRefzbMATHGoogle Scholar
  43. 366.
    E. L. Korn and R. Simon. Explained residual variation, explained risk, and goodness of fit. Am Statistician, 45:201–206, 1991.Google Scholar
  44. 373.
    J. M. Landwehr, D. Pregibon, and A. C. Shoemaker. Graphical methods for assessing logistic regression models (with discussion). J Am Stat Assoc, 79:61–83, 1984.CrossRefzbMATHGoogle Scholar
  45. 380.
    P. W. Lavori, R. Dawson, and T. B. Mueller. Causal estimation of time-varying treatment effects in observational studies: Application to depressive disorder. Stat Med, 13:1089–1100, 1994.CrossRefGoogle Scholar
  46. 387.
    S. le Cessie and J. C. van Houwelingen. A goodness-of-fit test for binary regression models, based on smoothing methods. Biometrics, 47:1267–1282, 1991.CrossRefzbMATHGoogle Scholar
  47. 406.
    J. G. Liao and D. McGee. Adjusted coefficients of determination for logistic regression. Am Statistician, 57:161–165, 2003.MathSciNetCrossRefzbMATHGoogle Scholar
  48. 416.
    K. Linnet. Assessing diagnostic tests by a strictly proper scoring rule. Stat Med, 8:609–618, 1989.CrossRefGoogle Scholar
  49. 424.
    K. Liu and A. R. Dyer. A rank statistic for assessing the amount of variation explained by risk factors in epidemiologic studies. Am J Epi, 109:597–606, 1979.Google Scholar
  50. 431.
    G. S. Maddala. Limited-Dependent and Qualitative Variables in Econometrics. Cambridge University Press, Cambridge, UK, 1983.CrossRefzbMATHGoogle Scholar
  51. 432.
    L. Magee. R 2 measures based on Wald and likelihood ratio joint significance tests. Am Statistician, 44:250–253, 1990.Google Scholar
  52. 454.
    S. Menard. Coefficients of determination for multiple logistic regression analysis. Am Statistician, 54:17–24, 2000.Google Scholar
  53. 457.
    M. E. Miller, S. L. Hui, and W. M. Tierney. Validation techniques for logistic regression models. Stat Med, 10:1213–1226, 1991.CrossRefGoogle Scholar
  54. 461.
    M. Mittlböck and M. Schemper. Explained variation for logistic regression. Stat Med, 15:1987–1997, 1996.CrossRefzbMATHGoogle Scholar
  55. 462.
    K. G. M. Moons, Donders, E. W. Steyerberg, and F. E. Harrell. Penalized maximum likelihood estimation to directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example. J Clin Epi, 57:1262–1270, 2004.Google Scholar
  56. 471.
    N. J. D. Nagelkerke. A note on a general definition of the coefficient of determination. Biometrika, 78:691–692, 1991.MathSciNetCrossRefzbMATHGoogle Scholar
  57. 473.
    R. Newson. Parameters behind “nonparametric” statistics: Kendall’s tau, Somers’ D and median differences. Stata Journal, 2(1), 2002.
  58. 474.
    R. Newson. Confidence intervals for rank statistics: Somers’ D and extensions. Stata J, 6(3):309–334, 2006.Google Scholar
  59. 479.
    P. C. O’Brien. Comparing two samples: Extensions of the t, rank-sum, and log-rank test. J Am Stat Assoc, 83:52–61, 1988.Google Scholar
  60. 490.
    M. J. Pencina, R. B. D’Agostino, and O. V. Demler. Novel metrics for evaluating improvement in discrimination: net reclassification and integrated discrimination improvement for normal variables and nested models. Stat Med, 31(2):101–113, 2012.MathSciNetCrossRefGoogle Scholar
  61. 493.
    M. J. Pencina, R. B. D’Agostino Sr, R. B. D’Agostino Jr, and R. S. Vasan. Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond. Stat Med, 27:157–172, 2008.MathSciNetCrossRefGoogle Scholar
  62. 511.
    D. Pregibon. Logistic regression diagnostics. Ann Stat, 9:705–724, 1981.MathSciNetCrossRefzbMATHGoogle Scholar
  63. 512.
    D. Pregibon. Resistant fits for some commonly used logistic models with medical applications. Biometrics, 38:485–498, 1982.CrossRefGoogle Scholar
  64. 514.
    S. J. Press and S. Wilson. Choosing between logistic regression and discriminant analysis. J Am Stat Assoc, 73:699–705, 1978.CrossRefzbMATHGoogle Scholar
  65. 515.
    D. B. Pryor, F. E. Harrell, K. L. Lee, R. M. Califf, and R. A. Rosati. Estimating the likelihood of significant coronary artery disease. Am J Med, 75:771–780, 1983.CrossRefzbMATHGoogle Scholar
  66. 526.
    J. M. Robins, S. D. Mark, and W. K. Newey. Estimating exposure effects by modeling the expectation of exposure conditional on confounders. Biometrics, 48:479–495, 1992.MathSciNetCrossRefzbMATHGoogle Scholar
  67. 527.
    L. D. Robinson and N. P. Jewell. Some surprising results about covariate adjustment in logistic regression models. Int Stat Rev, 59:227–240, 1991.CrossRefzbMATHGoogle Scholar
  68. 530.
    P. R. Rosenbaum and D. Rubin. The central role of the propensity score in observational studies for causal effects. Biometrika, 70:41–55, 1983.MathSciNetCrossRefzbMATHGoogle Scholar
  69. 531.
    P. R. Rosenbaum and D. B. Rubin. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J Roy Stat Soc B, 45:212–218, 1983.Google Scholar
  70. 573.
    J. C. Sinclair and M. B. Bracken. Clinically useful measures of effect in binary analyses of randomized trials. J Clin Epi, 47:881–889, 1994.CrossRefGoogle Scholar
  71. 579.
    R. H. Somers. A new asymmetric measure of association for ordinal variables. Am Soc Rev, 27:799–811, 1962.CrossRefGoogle Scholar
  72. 580.
    A. Spanos, F. E. Harrell, and D. T. Durack. Differential diagnosis of acute meningitis: An analysis of the predictive value of initial observations. JAMA, 262:2700–2707, 1989.CrossRefGoogle Scholar
  73. 584.
    N. Stallard. Simple tests for the external validation of mortality prediction scores. Stat Med, 28:377–388, 2009.MathSciNetCrossRefGoogle Scholar
  74. 588.
    E. W. Steyerberg, P. M. M. Bossuyt, and K. L. Lee. Clinical trials in acute myocardial infarction: Should we adjust for baseline characteristics? Am Heart J, 139:745–751, 2000. Editorial, pp. 761–763.Google Scholar
  75. 590.
    E. W. Steyerberg, M. J. C. Eijkemans, F. E. Harrell, and J. D. F. Habbema. Prognostic modeling with logistic regression analysis: In search of a sensible strategy in small data sets. Med Decis Mak, 21:45–56, 2001.CrossRefGoogle Scholar
  76. 613.
    T. Tjur. Coefficients of determination in logistic regression models—A new proposal: The coefficient of discrimination. Am Statistician, 63(4):366–372, 2009.MathSciNetCrossRefzbMATHGoogle Scholar
  77. 632.
    J. C. van Houwelingen and S. le Cessie. Logistic regression, a review. Statistica Neerlandica, 42:215–232, 1988.MathSciNetCrossRefGoogle Scholar
  78. 633.
    J. C. van Houwelingen and S. le Cessie. Predictive value of statistical models. Stat Med, 9:1303–1325, 1990.CrossRefGoogle Scholar
  79. 647.
    S. H. Walker and D. B. Duncan. Estimation of the probability of an event as a function of several independent variables. Biometrika, 54:167–178, 1967.MathSciNetCrossRefzbMATHGoogle Scholar
  80. 654.
    Y. Wax. Collinearity diagnosis for a relative risk regression analysis: An application to assessment of diet-cancer relationship in epidemiological studies. Stat Med, 11:1273–1287, 1992.CrossRefGoogle Scholar
  81. 658.
    T. L. Wenger, F. E. Harrell, K. K. Brown, S. Lederman, and H. C. Strauss. Ventricular fibrillation following canine coronary reperfusion: Different outcomes with pentobarbital and α-chloralose. Can J Phys Pharm, 62:224–228, 1984.CrossRefGoogle Scholar
  82. 684.
    B. Zheng and A. Agresti. Summarizing the predictive power of a generalized linear model. Stat Med, 19:1771–1781, 2000.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Frank E. HarrellJr.
    • 1
  1. 1.Department of BiostatisticsSchool of Medicine Vanderbilt UniversityNashvilleUSA

Personalised recommendations