TOP

, 16:345 | Cite as

Optimizing logistic regression coefficients for discrimination and calibration using estimation of distribution algorithms

  • V. Robles
  • C. Bielza
  • P. Larrañaga
  • S. González
  • L. Ohno-Machado
Original Paper

Abstract

Logistic regression is a simple and efficient supervised learning algorithm for estimating the probability of an outcome or class variable. In spite of its simplicity, logistic regression has shown very good performance in a range of fields. It is widely accepted in a range of fields because its results are easy to interpret. Fitting the logistic regression model usually involves using the principle of maximum likelihood. The Newton–Raphson algorithm is the most common numerical approach for obtaining the coefficients maximizing the likelihood of the data.

This work presents a novel approach for fitting the logistic regression model based on estimation of distribution algorithms (EDAs), a tool for evolutionary computation. EDAs are suitable not only for maximizing the likelihood, but also for maximizing the area under the receiver operating characteristic curve (AUC).

Thus, we tackle the logistic regression problem from a double perspective: likelihood-based to calibrate the model and AUC-based to discriminate between the different classes. Under these two objectives of calibration and discrimination, the Pareto front can be obtained in our EDA framework. These fronts are compared with those yielded by a multiobjective EDA recently introduced in the literature.

Keywords

Logistic regression Evolutionary algorithms Estimation of distribution algorithms Calibration and discrimination 

Mathematics Subject Classification (2000)

62J12 90C59 90C29 

References

  1. Baumgartner C, Böhm C, Baumgartner D, Marini G, Weinberger K, Olgemöller B, Liebl B, Roscher AA (2004) Supervised machine learning techniques for the classification of metabolic disorders in newborns. Bioinformatics 20(17):2985–2996 CrossRefGoogle Scholar
  2. Blanco R, Inza I, Larrañaga P (2003) Learning Bayesian networks in the space of structures by estimation of distribution algorithms. Int J Intell Syst 18:205–220 CrossRefGoogle Scholar
  3. Bouckaert R, Frank E (2004) Evaluating the replicability of significance tests for comparing learning algorithms. In: Dai H, Srikant R, Zhang C (eds) PAKDD. LNAI, vol 3056. Springer, Berlin, pp 3–12 Google Scholar
  4. Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159 CrossRefGoogle Scholar
  5. Brier G (1950) Verification of forecasts expressed in terms of probabilities. Monthly Weather Rev 78:1–3 CrossRefGoogle Scholar
  6. Deb K, Sinha A, Kukkonen S (2006) Multi-objective test problems, linkages, and evolutionary methodologies. In: GECCO-2006, Genetic and evolutionary computation conference, vol 2. ACM Press, New York, pp 1141–1148 CrossRefGoogle Scholar
  7. Fawcett T (2003) ROC graphs: Notes and practical considerations for data mining researchers. Technical report, HPL 2003-4, HP Labs Google Scholar
  8. Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading Google Scholar
  9. Hajek J, Zidak ZB, Sen PK (1999) Theory of rank tests, 2nd edn. Academic Press, San Diego Google Scholar
  10. Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36 Google Scholar
  11. Harrell FE, Lee KL, Califf R, Pryor D, Rosati R (1984) Regression modelling strategies for improved prognostic prediction. Stat Med 3:143–152 CrossRefGoogle Scholar
  12. Harrell FE, Lee KL, Mark DB (1996) Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 15:361–387 CrossRefGoogle Scholar
  13. Hilden J (1991) The area under the ROC curve and its competitors. Med Decis Mak 11(2):95–101 CrossRefGoogle Scholar
  14. Horton NJ, Brown ER, Qian L (2004) Use of R as a toolbox for mathematical statistics exploration. Am Stat 58(4):343–357 CrossRefGoogle Scholar
  15. Hosmer DW, Lemeshow S (2000) Applied logistic regression, 2nd edn. Wiley, New York Google Scholar
  16. Ihaka R, Gentleman R (1996) R: A language for data analysis and graphics. J Comput Graph Stat 5:229–314 CrossRefGoogle Scholar
  17. Inza I, Larrañaga P, Etxeberria R, Sierra B (2000) Feature subset selection by Bayesian network-based optimization. J Artif Intell Res 123(1–2):157–184 CrossRefGoogle Scholar
  18. Kiang MY (2003) A comparative assessment of classification methods. Decis Support Syst 35:441–454 CrossRefGoogle Scholar
  19. Larrañaga P, Lozano JA (2002) Estimation of distribution algorithms. A new tool for evolutionary computation. Kluwer Academic, Dordrecht Google Scholar
  20. Larrañaga P, Etxeberria R, Lozano JA, Peña JM (2000) Optimization in continuous domains by learning and simulation of Gaussian networks. In: Workshop in optimization by building and using probabilistic models within the 2000 genetic and evolutionary computation conference, GECCO 2000, pp 201–204 Google Scholar
  21. Lasko TA, Bhagwat JG, Zou KH, Ohno-Machado L (2005) The use of ROC curves in biomedical informatics. J Biomed Inform 38:404–415 CrossRefGoogle Scholar
  22. Lozano JA, Larrañaga P, Inza I, Bengoetxea E (2006) Towards a new evolutionary computation. Advances in estimation of distribution algorithms. Springer, New York Google Scholar
  23. McLachlan G (1992) Discriminant analysis and statistical pattern recognition. Wiley, New York Google Scholar
  24. Minka T (2003) A comparison of numerical optimizers for logistic regression. Technical report, 758, Carnegie Mellon University Google Scholar
  25. Nakamichi R, Imoto S, Miyano S (2004) Case-control study of binary disease trait considering interactions between SNPs and environmental effects using logistic regression. In: Fourth IEEE symposium on bioinformatics and bioengineering, vol 21, pp 73–78 Google Scholar
  26. Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases Google Scholar
  27. Ng A, Jordan M (2001) On discriminative versus generative classifiers: A comparison of logistic regression and naive Bayes. In: Proceedings of NIPS, vol 14, pp 841–848 Google Scholar
  28. Pepe MS (2003) The statistical evaluation of medical tests for classification and prediction. Oxford University Press, Oxford Google Scholar
  29. Provost F, Fawcett T, Kohavi R (1998) The case against accuracy estimation for comparing induction algorithms. In: Proceedings 15th international conference on machine learning. Morgan Kaufmann, San Mateo, pp 445–453 Google Scholar
  30. R Development Core Team (2004). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0 Google Scholar
  31. Romero T, Larrañaga P, Sierra B (2004) Learning Bayesian networks in the space of orderings with estimation of distribution algorithms. Int J Pattern Recogn Artif Intell 4(18):607–625 CrossRefGoogle Scholar
  32. Ryan TP (1997) Modern regression methods. Wiley, New York Google Scholar
  33. Steuer RE (1986) Multiple criteria optimization: Theory, computation, and application. Wiley, New York Google Scholar
  34. Steyerberg E, Borsboom G, van Houwelingen H, Eijkemans M, Habbema J (2004) Validation and updating of predictive logistic regression models: a study on sample size and shrinkage. Stat Med 23(10):2567–2586 CrossRefGoogle Scholar
  35. Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc Ser B 36:111–147 Google Scholar
  36. Thisted RA (1988) Elements of statistical computing. Chapman and Hall, London Google Scholar
  37. van den Hout WB (2003) The area under an ROC curve with limited information. Med Decis Mak 23:160–166 CrossRefGoogle Scholar
  38. Vinterbo S, Ohno-Machado L (1999a) A genetic algorithm to select variables in logistic regression: Example in the domain of myocardial infarctio. J Am Med Inform Assoc 6:984–988 Google Scholar
  39. Vinterbo S, Ohno-Machado L (1999b). A recalibration method for predictive models with dichotomous outcomes. In: Predictive models in medicine: Some methods for construction and adaptation. PhD thesis, Norwegian University of Science and Technology Google Scholar
  40. Winker P, Gilli M (2004) Applications of optimization heuristics to estimation and modelling problems. Computat Stat Data Anal 47:211–223 CrossRefGoogle Scholar
  41. Zhang Q, Zhou A, Jin Y (2008) RM-MEDA: A regularity model based multiobjective estimation of distribution algorithms. IEEE Trans Evol Comput 12(1):41–63 CrossRefGoogle Scholar

Copyright information

© Sociedad de Estadística e Investigación Operativa 2008

Authors and Affiliations

  • V. Robles
    • 1
  • C. Bielza
    • 2
  • P. Larrañaga
    • 2
  • S. González
    • 1
  • L. Ohno-Machado
    • 3
  1. 1.Departamento de Arquitectura y Tecnologia de Sistemas InformáticosUniversidad Politecnica de MadridMadridSpain
  2. 2.Departamento de Inteligencia ArtificialUniversidad Politecnica de MadridMadridSpain
  3. 3.Division of Health Science and TechnologyHarvard University and MITBostonUSA

Personalised recommendations