Abstract
Background: Audience segmentation strategies are of increasing interest to public health professionals who wish to identify easily defined, mutually exclusive population subgroups whose members share similar characteristics that help determine participation in a health-related behavior as a basis for targeted interventions. Classification and regression tree (C&RT) analysis is a nonparametric decision tree methodology that has the ability to efficiently segment populations into meaningful subgroups. However, it is not commonly used in public health.Purpose: This study provides a methodological overview of C&RT analysis for persons unfamiliar with the procedure.Methods and Results: An example of a C&RT analysis is provided and interpretation of results is discussed. Results are validated with those obtained from a logistic regression model that was created to replicate the C&RT findings. Results obtained from the example C&RT analysis are also compared to those obtained from a common approach to logistic regression, the stepwise selection procedure. Issues to consider when deciding whether to use C&RT are discussed, and situations in which C&RT may and may not be beneficial are described.Conclusions: C&RT is a promising research tool for the identification of at-risk populations in public health research and outreach.
Similar content being viewed by others
References
Breiman L, Friedman JH, Olshen RA, Stone CJ:Classification and Regression Trees (2nd Ed.). Pacific Grove, CA: Wadsworth, 1984.
Zhang H, Singer B:Recursive Partitioning in the Health Sciences. New York: Springer-Verlag, 1999.
Buntine W: Learning classification trees.Statistics and Computing. 1992,2:63–73.
Chipman H, George EI, McCulloch RE: Bayesian CART model search (with discussion).Journal of the American Statistical Association. 1998,93:935–960.
Ciampi A, Thiffault J, Nakache JP, Asselain B: Stratification by stepwise regression, correspondence analysis and recursive partitioning: A comparison of three methods of analysis for survival data with covariates.Computational Statistics and Data Analysis. 1986,4:185–204.
Ciampi A, Negassa A, Lou Z: Tree-structured prediction for censored survival data and the Cox model.Journal of Clinical Epidemiology. 1995,48:675–689.
Chou PA, Lookabaugh T, Gray RM: Optimal pruning with applications to tree-structured source coding and modeling.IEEE Trans Information Theory. 1989,35:299–315.
Denison DT, Mallick BK, Smith AM: A Bayesian CART algorithm.Biometrika. 1998,85:363–378.
Fienberg SE, Kim S-H: Calibration and refinement for classification trees.Journal of Statistical Planning and Inference. 1998,70:241–254.
Gordon L, Olshen RA: Almost surely consistent nonparametic regression from recursive partitioning schemes.Journal of Multivariate Analysis. 1984,15:147–163.
LeBlanc M, Crowley J: Relative risk trees for censored survival data.Biometrics. 1992,48:411–425.
Liu WZ, White AP: A comparison of nearest neighbour and tree-based methods of non-parametric discriminant analysis.Journal of Statistical Computation and Simulation. 1995,5:341–350.
Loh WY, Vanichestakul M: Treestructured classification via generalized discriminant analysis.Journal of the American Statistical Association. 1988,83:715–725.
Long WJ, Griffith JL, Selker HP, D’Agostino RB: A comparison of logistic regression to decision-tree induction in a medical domain.Computers and Biomedical Research. 1993,26:74–97.
McConnochie KM, Roghmann KJ, Pasternack J: Developing prediction rules and evaluating observation patterns using categorical clinical markers: Two complementary procedures.Medical Decision Making. 1993,13:30–42.
Oliver JJ, Hand D: Averaging over decision trees.Journal of Classification. 1996,13:281–297.
Pallara A: A binary decision trees approach to classification: A review of CART and other methods with some applications in real data.Statistica Applicata. 1992,4:255–286.
Segal MR, Bloch DA: A comparison of estimated proportional hazards models and regression trees.Statistics in Medicine. 1989,8:539–550.
Segal MR: Tree-structured methods of logitudinal data.Journal of the American Statistical Association. 1992,87:407–418.
Segal MR: Extending the elements of treestructured regression.Statistical Methods in Medical Research. 1995,4:219–236.
Selker HP, Griffith JL, Patil S, Long WJ, D’Agostino RB: A comparison of performance of mathematical predictive methods for medical diagnosis: Identifying acute cardiac ischemia among emergency department patients.Journal of Investigative Medicine. 1995,43:468–476.
Shannon WD, Banks D: Combining classification trees using MLE.Statistics in Medicine. 1999,18:727–740.
Zhang H, Holford T, Bracken MB: A tree-based method of analysis for prospective studies.Statistics in Medicine. 1996,15:37–49.
Zhang HP: Comments on Bayesian CART model search.Journal of the American Statistical Association. 1998,93:948–950.
Zhang H: Classification trees for multiple binary responses.Journal of the American Statistical Association. 1998,93:180–193.
Slater MD: Choosing audience segmentation strategies and methods. In Maibach E, Parrott RL (eds),Designing Health Messages: Approaches From Communication Theory and Public Health Practice. Thousand Oaks, CA: Sage, 1995, 186–198.
Barriga KJ, Hamman RF, Hoag S, Marshall JA, Shetterly SM: Population screening for glucose intolerant subjects using decision tree analyses.Diabetes Research and Clinical Practice. 1996,34(Suppl.):S17-S29.
Curran {jrJr}. WJ, Scott CB, Horton J, et al.: Recursive partitioning analysis of prognostic factors in three Radiation Therapy Oncology Group malignant glioma trials.Journal of National Cancer Institute. 1993,85:704–710.
El-Serag HB, Graham DY, Richardson P, Inadomi JM: Prevention of complicated ulcer disease among chronic users of nonsteroidal anti-inflammatory drugs: The use of a nomogram in cost-effectiveness analysis.Archives of Internal Medicine. 2002,162:2105–2110.
Falconer JA, Naughton BJ, Dunlop DD, et al.: Predicting stroke inpatient rehabilitation outcome using a classification tree approach.Archives of Physical Medicine and Rehabilitation. 1994,75:619–625.
Gabriel SE, Crowson CS, O’Fallon WM: A mathematical model that improves the validity of osteoarthritis diagnoses obtained from a computerized diagnostic database.Journal of Clinical Epidemiology. 1996,49:1025–1029.
Germanson T, Lanzino G, Kassell NF: CART for prediction of function after head trauma.Journal of Neurosurgery. 1995,83:941–942.
Goldman L, Cook EF, Johnson PA, et al.: Prediction of the need for intensive care in patients who come to the emergency departments with acute chest pain.New England Journal of Medicine. 1996,334:1498–1504.
Guccione AA, Anderson JJ, Anthony JM, Meenan RF: The correlates of health perceptions in rheumatoid arthritis.Journal of Rheumatology. 1995,22:432–439.
Haukoos JS, Witt MD, Zeumer CM, et al.: Emergency department triage of patients infected with HIV.Academic Emergency Medicine. 2002,9:880–888.
Hess KR, Abbruzzese MC, Lenzi R, Raber MN, Abbruzzese JL: Classification and regression tree analysis of 1000 consecutive patients with unknown primary carcinoma.Clinical Cancer Research. 1999,5:3403–3410.
Kaufman KE, Bailit JL, Grobman W: Elective induction: An analysis of economic and health consequences.American Journal of Obstetrics and Gynecology. 2002,187:858–863.
Pilote L, Miller DP, Califf RM, et al.: Determinants of the use of coronary angiography and revascularization after thrombolysis for acute myocardial infarction.New England Journal of Medicine. 1996,335:1198–1205.
Podgorelec V, Kokol P, Stiglic B, Rozman I: Decision trees: An overview and their use in medicine.Journal of Medical Systems. 2002,26:445–463.
Rainer TH, Lam PK, Wong EM, Cocks RA: Derivation of a prediction rule for post-traumatic acute lung injury.Resuscitation. 1999,42:187–196.
Roehrborn CG, Malice M, Cook TJ, Girman CJ: Clinical predictors of spontaneous acute urinary retention in men with LUTS and clinical BPH: A comprehensive analysis of the pooled placebo groups of several large clinical trials.Urology. 2001,58:210–216.
Rudolfer SM, Paliouras G, Peers IS: A comparison of logistic regression to decision tree induction in the diagnosis of carpal tunnel syndrome.Computers and Biomedical Research. 1999,32:391–414.
Temkin NR, Holubkov R, Machamer JE, Winn HR, Dikmen SS: Classification and regression trees (CART) for prediction of function at 1 year following head trauma.Journal of Neurosurgery. 1995,82:764–771.
Travis SP, Farrant JM, Ricketts C, et al.: Predicting outcome in severe ulcerative colitis.Gut. 1996,38:905–910.
Wietlisbach V, Vader JP, Porchet F, Costanza MC, Burnand B: Statistical approaches in the development of clinical practice guidelines from expert panels: The case of laminectomy in sciatica patients.Medical Care. 1999,37:785–797.
Bachur RG, Harper MB: Predictive model for serious bacterial infections among infants younger than 3 months of age.Pediatrics. 2001,108:311–316.
Camp NJ, Slattery ML: Classification tree analysis: A statistical tool to investigate risk factor interactions with an example for colon cancer (United States).Cancer Causes Control. 2002,13:813–823.
Carmelli D, Zhang H, Swan GE: Obesity and 33-year follow-up for coronary heart disease and cancer mortality.Epidemiology. 1997,8:378–383.
Choi SC, Muizelaar JP, Barnes TY, et al.: Prediction tree for severely head-injured patients.Journal of Neurosurgery. 1991,75:251–255.
El-Solh AA, Sikka P, Ramadan F: Outcome of olderpatients with severe pneumoniapredicted by recursive partitioning.Journal of the American Geriatrics Society. 2001,49:1614–1621.
Kuchibhatla M, Fillenbaum GG: Assessing risk factors for mortality in elderly White and African American people: Implications of alternative analyses.Gerontologist. 2002,42:826–834.
Mehta RH, Eagle KA, Coombs LP, et al.: Influence of age on outcomes in patients undergoing mitral valve replacement.Annals of Thoracic Surgery. 2002,74:1459–1467.
Nelson LM, Bloch DA, Longstreth Jr. WT, Shi H: Recursive partitioning for the identification of disease risk subgroups: A case-control study of subarachnoid hemorrhage.Journal of Clinical Epidemiology. 1998,51:199–209.
McGrath JS, Ponich TP, Gregor JC: Screening for colorectal cancer: The cost to find an advanced adenoma.American Journal of Gastroenterology. 2002,97:2902–2907.
Smith KJ, Roberts MS: Cost-effectiveness of newer treatment strategies for influenza.American Journal of Medicine. 2002,113:300–307.
LaValley M, McAlindon TE, Evans S, Chaisson CE, Felson DT: Problems in the development and validation of questionnaire-based screening instruments for ascertaining cases with symptomatic knee osteoarthritis: The Framingham Study.Arthritis and Rheumatism. 2001,44:1105–1113.
Gregory KD, Korst LM, Platt LD: Variation in elective primary cesarean delivery by patient and hospital factors.American Journal of Obstetrics and Gynecology. 2001,184:1521–1532; discussion 1532–1534.
Hosmer DW, Lemeshow S:Applied Logistic Regression (2nd Ed.). New York: Wiley, 2000.
Forthofer MS, Bryant CA: Using audience-segmentation techniques to tailor health behavior change strategies.American Journal of Health Behavior. 2000,24:36–43.
LeBlanc M, Tibshirani R: Monotone shrinkage of trees.Journal of Computational and Graphical Statistics. 1998,7:417–433.
Centers for Disease Control and Prevention:Behavioral Risk Factory Surveillance System User’s Guide. Atlanta: U.S. Department of Health and Human Services, 1998.
U.S. Preventive Services Task Force:Guide to Clinical Preventive Health Care: Report of the U.S. Preventive Services Task Force. Baltimore, MD: Williams & Wilkins, 1996.
Centers for Disease Control and Prevention: Prevention and control of influenza: Recommendations of the Advisory Committee on Immunization Practices (ACIP).Morbidity and Morality Weekly Report. 2001,50(RR04):1–46.
American College of Physicians Task Force on Adult Immunization, Infectious Diseases Society of America:Guide for Adult Immunization (3rd Ed.). Philadelphia: American College of Physicians, 1994.
Centers for Disease Control and Prevention: From the Centers for Disease Control and Prevention: Influenza and pneumo-coccal vaccination levels among persons aged >/=65 years, United States, 1999.Journal of the American Medical Association, 2001,286:413–414.
Centers for Disease Control and Prevention: Leads from the Morbidity and Mortality Weekly Report, Atlanta, GA: Race-specific differences in influenza vaccination levels among Medicare beneficiaries-United States, 1993.Journal of the American Medical Association. 1995,273:449–451.
Schneider EC, Cleary PD, Zaslavsky AM, Epstein AM: Racial disparity in influenza vaccination: Does managed care narrow the gap between African Americans and Whites?Journal of the American Medical Association. 2001,286:1455–1460.
Petersen RL, Saag K, Wallace RB, Doebbeling BN: Influenza and pneumococcal vaccine receipt in older persons with chronic disease: A population-based study.Medical Care. 1999,37:502–509.
Fiebach NH, Viscoli CM: Patient acceptance of influenza vaccination.American Journal of Medicine. 1991,91:393–400.
Marin MG, Johanson {jrJr.} WG, Salas-Lopez D: Influenza vaccination among minority populations in the United States.Preventive Medicine. 2002,34:235–241.
Fiscella K, Franks P, Doescher MP, Saver BG: Disparities in health care by race, ethnicity, and language among the insured: Findings from anational sample.Medical Care. 2002,40:52–59.
SPSS:AnswerTree 2.0 User’s Guide. Chicago: SPSS, Inc., 1998.
SAS Institute:SAS/STAT User’s Guide, Version 8, Volumes 1, 2 and 3. Cary, NC: SAS Institute, 2000.
Steinberg D, Colla P:CART: Tree-structured non-parametric data analysis. San Diego, CA: Salford Systems, 1995.
Breiman L: Bagging predictors.Machine Learning. 1996,24:123–140.
Breiman L: Arcing classifiers.Annals of Statistics. 1998,26:801–824.
Hothorn T, Lausen B: Bagging tree classifiers for laser scanning images: A data- and simulation-based strategy.Artificial Intelligence in Medicine. 2003,27:65–79.
Marshall RJ: The use of classification and regression trees in clinical epidemiology.Journal of Clinical Epidemiology. 2001,54:603–609.
Michie D, Spiegelhalter DJ, Taylor CC:Machine Learning, Neural and Statistical Classification. New York: Ellis Horwood, 1994.
U.S. Department of Health and Human Services, Office of Disease Prevention and Health Promotion:Healthy People 2010: Understanding and Improving Health and Objectives for Improving Health (2nd Ed.). Washington, DC: U.S. Department of Health and Human Services, 2000.
Smedley BD, Stith AY, Nelson AR: The Institute of Medicine Report: Unequal treatment: Confronting racial and ethnic disparaties in health care. Washington, DC: National Academy Press, 2002.
Author information
Authors and Affiliations
Corresponding author
Additional information
This article was prepared as part of Stephenie Lemon’s doctoral dissertation at Brown University.
About this article
Cite this article
Lemon, S.C., Roy, J., Clark, M.A. et al. Classification and regression tree analysis in public health: Methodological review and comparison with logistic regression. ann. behav. med. 26, 172–181 (2003). https://doi.org/10.1207/S15324796ABM2603_02
Issue Date:
DOI: https://doi.org/10.1207/S15324796ABM2603_02