Selection of Transformations of Continuous Predictors in Logistic Regression

  • Michael Chang
  • Rohan J. Dalpatadu
  • Ashok K. Singh
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 738)


The binary logistic regression is a machine learning tool for classification and discrimination that is widely used in business analytics and medical research. Transforming continuous predictors to improve model performance of logistic regression is a common practice, but no systematic method for finding optimal transformations exists in the statistical or data mining literature. In this paper, the problem of selecting transformations of continuous predictors to improve the performance of logistic regression models is considered. The proposed method is based upon the point-biserial correlation coefficient between the binary response and a continuous predictor. Several examples are presented to illustrate the proposed method.


Machine learning Data mining Precision Recal F1 


  1. 1.
    M.H. Kutner, C.J. Nachtsheim, J. Neter, Applied Linear Regression Models, 4th edn. (McGraw-Hill Higher Education, Boston, 2004), pp. 129–141Google Scholar
  2. 2.
    F.E. Harrell, Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis (Springer Science & Business Media, New York, 2001), pp. 7–10CrossRefGoogle Scholar
  3. 3.
    E.W. Steyerberg, Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating (Springer Science & Business Media, New York, 2008), pp. 57–58Google Scholar
  4. 4.
    R. Kay, S. Little, Transformations of the explanatory variables in the logistic regression model for binary data. Biomelrika 74(3), 495–501 (1987)MathSciNetCrossRefGoogle Scholar
  5. 5.
    H.C. Kraemer, Correlation coefficients in medical research: from product moment correlation to the odds ratio. Stat. Methods Med. Res. 15, 525–545 (2006)MathSciNetCrossRefGoogle Scholar
  6. 6.
    NCSS Statistical Software Manual, Chapter 302. Point-Biserial and Biserial Correlations.
  7. 7.
    F. Guillet, H. Hamilton, J. (eds.), Quality Measures in Data Mining, vol 43 (Springer, New York, 2007)zbMATHGoogle Scholar
  8. 8.
    G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning, vol 6 (Springer, New York, 2013)CrossRefGoogle Scholar
  9. 9.
    D.W. Hosmer Jr., H. Lemeshow, Applied Logistic Regression (Wiley, New York, 2004)zbMATHGoogle Scholar
  10. 10.
    F. Cady, The Data Science Handbook (Wiley, New York, 2017), pp. 118–119Google Scholar
  11. 11.
    D.M.W. Powers, Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)MathSciNetGoogle Scholar
  12. 12.
    J. Fox, G. Monette, Generalized collinearity diagnostics. J. Am. Stat. Assoc. 87, 178–183 (1992)CrossRefGoogle Scholar
  13. 13.
    E.W. Steyerberg, A.J. Vickers, N.R. Cook, T. Gerds, M. Gonen, N. Obuchowski, M.J. Pencina, M.W. Kattan, Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology 21(1), 128–138 (2010)CrossRefGoogle Scholar
  14. 14.
    M. Bozorgi, K. Taghva, A.K. Singh, Cancer survivability with logistic regression, in Computing Conference 2017, London, July 2017, pp. 18–20Google Scholar
  15. 15.
    Y. Zhao, R and Data Mining: Examples and Case Studies (Academic Press, London, 2012), pp. 90–92Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Michael Chang
    • 1
  • Rohan J. Dalpatadu
    • 1
  • Ashok K. Singh
    • 2
  1. 1.Department of Mathematical SciencesUniversity of Nevada, Las VegasLas VegasUSA
  2. 2.William F. Harrah College of Hotel AdministrationUniversity of Nevada, Las VegasLas VegasUSA

Personalised recommendations