Skip to main content
Log in

Classification and regression tree analysis in public health: Methodological review and comparison with logistic regression

  • Published:
Annals of Behavioral Medicine

Abstract

Background: Audience segmentation strategies are of increasing interest to public health professionals who wish to identify easily defined, mutually exclusive population subgroups whose members share similar characteristics that help determine participation in a health-related behavior as a basis for targeted interventions. Classification and regression tree (C&RT) analysis is a nonparametric decision tree methodology that has the ability to efficiently segment populations into meaningful subgroups. However, it is not commonly used in public health.Purpose: This study provides a methodological overview of C&RT analysis for persons unfamiliar with the procedure.Methods and Results: An example of a C&RT analysis is provided and interpretation of results is discussed. Results are validated with those obtained from a logistic regression model that was created to replicate the C&RT findings. Results obtained from the example C&RT analysis are also compared to those obtained from a common approach to logistic regression, the stepwise selection procedure. Issues to consider when deciding whether to use C&RT are discussed, and situations in which C&RT may and may not be beneficial are described.Conclusions: C&RT is a promising research tool for the identification of at-risk populations in public health research and outreach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Breiman L, Friedman JH, Olshen RA, Stone CJ:Classification and Regression Trees (2nd Ed.). Pacific Grove, CA: Wadsworth, 1984.

    Google Scholar 

  2. Zhang H, Singer B:Recursive Partitioning in the Health Sciences. New York: Springer-Verlag, 1999.

    Google Scholar 

  3. Buntine W: Learning classification trees.Statistics and Computing. 1992,2:63–73.

    Article  Google Scholar 

  4. Chipman H, George EI, McCulloch RE: Bayesian CART model search (with discussion).Journal of the American Statistical Association. 1998,93:935–960.

    Article  Google Scholar 

  5. Ciampi A, Thiffault J, Nakache JP, Asselain B: Stratification by stepwise regression, correspondence analysis and recursive partitioning: A comparison of three methods of analysis for survival data with covariates.Computational Statistics and Data Analysis. 1986,4:185–204.

    Article  Google Scholar 

  6. Ciampi A, Negassa A, Lou Z: Tree-structured prediction for censored survival data and the Cox model.Journal of Clinical Epidemiology. 1995,48:675–689.

    Article  PubMed  CAS  Google Scholar 

  7. Chou PA, Lookabaugh T, Gray RM: Optimal pruning with applications to tree-structured source coding and modeling.IEEE Trans Information Theory. 1989,35:299–315.

    Article  Google Scholar 

  8. Denison DT, Mallick BK, Smith AM: A Bayesian CART algorithm.Biometrika. 1998,85:363–378.

    Article  Google Scholar 

  9. Fienberg SE, Kim S-H: Calibration and refinement for classification trees.Journal of Statistical Planning and Inference. 1998,70:241–254.

    Article  Google Scholar 

  10. Gordon L, Olshen RA: Almost surely consistent nonparametic regression from recursive partitioning schemes.Journal of Multivariate Analysis. 1984,15:147–163.

    Article  Google Scholar 

  11. LeBlanc M, Crowley J: Relative risk trees for censored survival data.Biometrics. 1992,48:411–425.

    Article  PubMed  CAS  Google Scholar 

  12. Liu WZ, White AP: A comparison of nearest neighbour and tree-based methods of non-parametric discriminant analysis.Journal of Statistical Computation and Simulation. 1995,5:341–350.

    Google Scholar 

  13. Loh WY, Vanichestakul M: Treestructured classification via generalized discriminant analysis.Journal of the American Statistical Association. 1988,83:715–725.

    Article  Google Scholar 

  14. Long WJ, Griffith JL, Selker HP, D’Agostino RB: A comparison of logistic regression to decision-tree induction in a medical domain.Computers and Biomedical Research. 1993,26:74–97.

    Article  PubMed  CAS  Google Scholar 

  15. McConnochie KM, Roghmann KJ, Pasternack J: Developing prediction rules and evaluating observation patterns using categorical clinical markers: Two complementary procedures.Medical Decision Making. 1993,13:30–42.

    Article  PubMed  CAS  Google Scholar 

  16. Oliver JJ, Hand D: Averaging over decision trees.Journal of Classification. 1996,13:281–297.

    Article  Google Scholar 

  17. Pallara A: A binary decision trees approach to classification: A review of CART and other methods with some applications in real data.Statistica Applicata. 1992,4:255–286.

    Google Scholar 

  18. Segal MR, Bloch DA: A comparison of estimated proportional hazards models and regression trees.Statistics in Medicine. 1989,8:539–550.

    Article  PubMed  CAS  Google Scholar 

  19. Segal MR: Tree-structured methods of logitudinal data.Journal of the American Statistical Association. 1992,87:407–418.

    Article  Google Scholar 

  20. Segal MR: Extending the elements of treestructured regression.Statistical Methods in Medical Research. 1995,4:219–236.

    Article  PubMed  CAS  Google Scholar 

  21. Selker HP, Griffith JL, Patil S, Long WJ, D’Agostino RB: A comparison of performance of mathematical predictive methods for medical diagnosis: Identifying acute cardiac ischemia among emergency department patients.Journal of Investigative Medicine. 1995,43:468–476.

    PubMed  CAS  Google Scholar 

  22. Shannon WD, Banks D: Combining classification trees using MLE.Statistics in Medicine. 1999,18:727–740.

    Article  PubMed  CAS  Google Scholar 

  23. Zhang H, Holford T, Bracken MB: A tree-based method of analysis for prospective studies.Statistics in Medicine. 1996,15:37–49.

    Article  PubMed  Google Scholar 

  24. Zhang HP: Comments on Bayesian CART model search.Journal of the American Statistical Association. 1998,93:948–950.

    Article  Google Scholar 

  25. Zhang H: Classification trees for multiple binary responses.Journal of the American Statistical Association. 1998,93:180–193.

    Article  Google Scholar 

  26. Slater MD: Choosing audience segmentation strategies and methods. In Maibach E, Parrott RL (eds),Designing Health Messages: Approaches From Communication Theory and Public Health Practice. Thousand Oaks, CA: Sage, 1995, 186–198.

    Google Scholar 

  27. Barriga KJ, Hamman RF, Hoag S, Marshall JA, Shetterly SM: Population screening for glucose intolerant subjects using decision tree analyses.Diabetes Research and Clinical Practice. 1996,34(Suppl.):S17-S29.

    PubMed  Google Scholar 

  28. Curran {jrJr}. WJ, Scott CB, Horton J, et al.: Recursive partitioning analysis of prognostic factors in three Radiation Therapy Oncology Group malignant glioma trials.Journal of National Cancer Institute. 1993,85:704–710.

    Article  Google Scholar 

  29. El-Serag HB, Graham DY, Richardson P, Inadomi JM: Prevention of complicated ulcer disease among chronic users of nonsteroidal anti-inflammatory drugs: The use of a nomogram in cost-effectiveness analysis.Archives of Internal Medicine. 2002,162:2105–2110.

    Article  PubMed  Google Scholar 

  30. Falconer JA, Naughton BJ, Dunlop DD, et al.: Predicting stroke inpatient rehabilitation outcome using a classification tree approach.Archives of Physical Medicine and Rehabilitation. 1994,75:619–625.

    Article  PubMed  CAS  Google Scholar 

  31. Gabriel SE, Crowson CS, O’Fallon WM: A mathematical model that improves the validity of osteoarthritis diagnoses obtained from a computerized diagnostic database.Journal of Clinical Epidemiology. 1996,49:1025–1029.

    Article  PubMed  CAS  Google Scholar 

  32. Germanson T, Lanzino G, Kassell NF: CART for prediction of function after head trauma.Journal of Neurosurgery. 1995,83:941–942.

    PubMed  CAS  Google Scholar 

  33. Goldman L, Cook EF, Johnson PA, et al.: Prediction of the need for intensive care in patients who come to the emergency departments with acute chest pain.New England Journal of Medicine. 1996,334:1498–1504.

    Article  PubMed  CAS  Google Scholar 

  34. Guccione AA, Anderson JJ, Anthony JM, Meenan RF: The correlates of health perceptions in rheumatoid arthritis.Journal of Rheumatology. 1995,22:432–439.

    PubMed  CAS  Google Scholar 

  35. Haukoos JS, Witt MD, Zeumer CM, et al.: Emergency department triage of patients infected with HIV.Academic Emergency Medicine. 2002,9:880–888.

    PubMed  Google Scholar 

  36. Hess KR, Abbruzzese MC, Lenzi R, Raber MN, Abbruzzese JL: Classification and regression tree analysis of 1000 consecutive patients with unknown primary carcinoma.Clinical Cancer Research. 1999,5:3403–3410.

    PubMed  CAS  Google Scholar 

  37. Kaufman KE, Bailit JL, Grobman W: Elective induction: An analysis of economic and health consequences.American Journal of Obstetrics and Gynecology. 2002,187:858–863.

    Article  PubMed  Google Scholar 

  38. Pilote L, Miller DP, Califf RM, et al.: Determinants of the use of coronary angiography and revascularization after thrombolysis for acute myocardial infarction.New England Journal of Medicine. 1996,335:1198–1205.

    Article  PubMed  CAS  Google Scholar 

  39. Podgorelec V, Kokol P, Stiglic B, Rozman I: Decision trees: An overview and their use in medicine.Journal of Medical Systems. 2002,26:445–463.

    Article  PubMed  Google Scholar 

  40. Rainer TH, Lam PK, Wong EM, Cocks RA: Derivation of a prediction rule for post-traumatic acute lung injury.Resuscitation. 1999,42:187–196.

    Article  PubMed  CAS  Google Scholar 

  41. Roehrborn CG, Malice M, Cook TJ, Girman CJ: Clinical predictors of spontaneous acute urinary retention in men with LUTS and clinical BPH: A comprehensive analysis of the pooled placebo groups of several large clinical trials.Urology. 2001,58:210–216.

    Article  PubMed  CAS  Google Scholar 

  42. Rudolfer SM, Paliouras G, Peers IS: A comparison of logistic regression to decision tree induction in the diagnosis of carpal tunnel syndrome.Computers and Biomedical Research. 1999,32:391–414.

    Article  PubMed  CAS  Google Scholar 

  43. Temkin NR, Holubkov R, Machamer JE, Winn HR, Dikmen SS: Classification and regression trees (CART) for prediction of function at 1 year following head trauma.Journal of Neurosurgery. 1995,82:764–771.

    Article  PubMed  CAS  Google Scholar 

  44. Travis SP, Farrant JM, Ricketts C, et al.: Predicting outcome in severe ulcerative colitis.Gut. 1996,38:905–910.

    Article  PubMed  CAS  Google Scholar 

  45. Wietlisbach V, Vader JP, Porchet F, Costanza MC, Burnand B: Statistical approaches in the development of clinical practice guidelines from expert panels: The case of laminectomy in sciatica patients.Medical Care. 1999,37:785–797.

    Article  PubMed  CAS  Google Scholar 

  46. Bachur RG, Harper MB: Predictive model for serious bacterial infections among infants younger than 3 months of age.Pediatrics. 2001,108:311–316.

    Article  PubMed  CAS  Google Scholar 

  47. Camp NJ, Slattery ML: Classification tree analysis: A statistical tool to investigate risk factor interactions with an example for colon cancer (United States).Cancer Causes Control. 2002,13:813–823.

    Article  PubMed  Google Scholar 

  48. Carmelli D, Zhang H, Swan GE: Obesity and 33-year follow-up for coronary heart disease and cancer mortality.Epidemiology. 1997,8:378–383.

    Article  PubMed  CAS  Google Scholar 

  49. Choi SC, Muizelaar JP, Barnes TY, et al.: Prediction tree for severely head-injured patients.Journal of Neurosurgery. 1991,75:251–255.

    PubMed  CAS  Google Scholar 

  50. El-Solh AA, Sikka P, Ramadan F: Outcome of olderpatients with severe pneumoniapredicted by recursive partitioning.Journal of the American Geriatrics Society. 2001,49:1614–1621.

    Article  PubMed  CAS  Google Scholar 

  51. Kuchibhatla M, Fillenbaum GG: Assessing risk factors for mortality in elderly White and African American people: Implications of alternative analyses.Gerontologist. 2002,42:826–834.

    PubMed  Google Scholar 

  52. Mehta RH, Eagle KA, Coombs LP, et al.: Influence of age on outcomes in patients undergoing mitral valve replacement.Annals of Thoracic Surgery. 2002,74:1459–1467.

    Article  PubMed  Google Scholar 

  53. Nelson LM, Bloch DA, Longstreth Jr. WT, Shi H: Recursive partitioning for the identification of disease risk subgroups: A case-control study of subarachnoid hemorrhage.Journal of Clinical Epidemiology. 1998,51:199–209.

    Article  PubMed  CAS  Google Scholar 

  54. McGrath JS, Ponich TP, Gregor JC: Screening for colorectal cancer: The cost to find an advanced adenoma.American Journal of Gastroenterology. 2002,97:2902–2907.

    Article  PubMed  CAS  Google Scholar 

  55. Smith KJ, Roberts MS: Cost-effectiveness of newer treatment strategies for influenza.American Journal of Medicine. 2002,113:300–307.

    Article  PubMed  Google Scholar 

  56. LaValley M, McAlindon TE, Evans S, Chaisson CE, Felson DT: Problems in the development and validation of questionnaire-based screening instruments for ascertaining cases with symptomatic knee osteoarthritis: The Framingham Study.Arthritis and Rheumatism. 2001,44:1105–1113.

    Article  PubMed  CAS  Google Scholar 

  57. Gregory KD, Korst LM, Platt LD: Variation in elective primary cesarean delivery by patient and hospital factors.American Journal of Obstetrics and Gynecology. 2001,184:1521–1532; discussion 1532–1534.

    Article  PubMed  CAS  Google Scholar 

  58. Hosmer DW, Lemeshow S:Applied Logistic Regression (2nd Ed.). New York: Wiley, 2000.

    Google Scholar 

  59. Forthofer MS, Bryant CA: Using audience-segmentation techniques to tailor health behavior change strategies.American Journal of Health Behavior. 2000,24:36–43.

    Google Scholar 

  60. LeBlanc M, Tibshirani R: Monotone shrinkage of trees.Journal of Computational and Graphical Statistics. 1998,7:417–433.

    Article  Google Scholar 

  61. Centers for Disease Control and Prevention:Behavioral Risk Factory Surveillance System User’s Guide. Atlanta: U.S. Department of Health and Human Services, 1998.

    Google Scholar 

  62. U.S. Preventive Services Task Force:Guide to Clinical Preventive Health Care: Report of the U.S. Preventive Services Task Force. Baltimore, MD: Williams & Wilkins, 1996.

    Google Scholar 

  63. Centers for Disease Control and Prevention: Prevention and control of influenza: Recommendations of the Advisory Committee on Immunization Practices (ACIP).Morbidity and Morality Weekly Report. 2001,50(RR04):1–46.

    Google Scholar 

  64. American College of Physicians Task Force on Adult Immunization, Infectious Diseases Society of America:Guide for Adult Immunization (3rd Ed.). Philadelphia: American College of Physicians, 1994.

    Google Scholar 

  65. Centers for Disease Control and Prevention: From the Centers for Disease Control and Prevention: Influenza and pneumo-coccal vaccination levels among persons aged >/=65 years, United States, 1999.Journal of the American Medical Association, 2001,286:413–414.

    Article  Google Scholar 

  66. Centers for Disease Control and Prevention: Leads from the Morbidity and Mortality Weekly Report, Atlanta, GA: Race-specific differences in influenza vaccination levels among Medicare beneficiaries-United States, 1993.Journal of the American Medical Association. 1995,273:449–451.

    Article  Google Scholar 

  67. Schneider EC, Cleary PD, Zaslavsky AM, Epstein AM: Racial disparity in influenza vaccination: Does managed care narrow the gap between African Americans and Whites?Journal of the American Medical Association. 2001,286:1455–1460.

    Article  PubMed  CAS  Google Scholar 

  68. Petersen RL, Saag K, Wallace RB, Doebbeling BN: Influenza and pneumococcal vaccine receipt in older persons with chronic disease: A population-based study.Medical Care. 1999,37:502–509.

    Article  PubMed  CAS  Google Scholar 

  69. Fiebach NH, Viscoli CM: Patient acceptance of influenza vaccination.American Journal of Medicine. 1991,91:393–400.

    Article  PubMed  CAS  Google Scholar 

  70. Marin MG, Johanson {jrJr.} WG, Salas-Lopez D: Influenza vaccination among minority populations in the United States.Preventive Medicine. 2002,34:235–241.

    Article  PubMed  Google Scholar 

  71. Fiscella K, Franks P, Doescher MP, Saver BG: Disparities in health care by race, ethnicity, and language among the insured: Findings from anational sample.Medical Care. 2002,40:52–59.

    Article  PubMed  Google Scholar 

  72. SPSS:AnswerTree 2.0 User’s Guide. Chicago: SPSS, Inc., 1998.

    Google Scholar 

  73. SAS Institute:SAS/STAT User’s Guide, Version 8, Volumes 1, 2 and 3. Cary, NC: SAS Institute, 2000.

    Google Scholar 

  74. Steinberg D, Colla P:CART: Tree-structured non-parametric data analysis. San Diego, CA: Salford Systems, 1995.

    Google Scholar 

  75. Breiman L: Bagging predictors.Machine Learning. 1996,24:123–140.

    Google Scholar 

  76. Breiman L: Arcing classifiers.Annals of Statistics. 1998,26:801–824.

    Article  Google Scholar 

  77. Hothorn T, Lausen B: Bagging tree classifiers for laser scanning images: A data- and simulation-based strategy.Artificial Intelligence in Medicine. 2003,27:65–79.

    Article  PubMed  Google Scholar 

  78. Marshall RJ: The use of classification and regression trees in clinical epidemiology.Journal of Clinical Epidemiology. 2001,54:603–609.

    Article  PubMed  CAS  Google Scholar 

  79. Michie D, Spiegelhalter DJ, Taylor CC:Machine Learning, Neural and Statistical Classification. New York: Ellis Horwood, 1994.

    Google Scholar 

  80. U.S. Department of Health and Human Services, Office of Disease Prevention and Health Promotion:Healthy People 2010: Understanding and Improving Health and Objectives for Improving Health (2nd Ed.). Washington, DC: U.S. Department of Health and Human Services, 2000.

    Google Scholar 

  81. Smedley BD, Stith AY, Nelson AR: The Institute of Medicine Report: Unequal treatment: Confronting racial and ethnic disparaties in health care. Washington, DC: National Academy Press, 2002.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephenie C. Lemon Ph.D..

Additional information

This article was prepared as part of Stephenie Lemon’s doctoral dissertation at Brown University.

About this article

Cite this article

Lemon, S.C., Roy, J., Clark, M.A. et al. Classification and regression tree analysis in public health: Methodological review and comparison with logistic regression. ann. behav. med. 26, 172–181 (2003). https://doi.org/10.1207/S15324796ABM2603_02

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1207/S15324796ABM2603_02

Keywords

Navigation