Skip to main content
Log in

Detecting Moderator Effects Using Subgroup Analyses

  • Published:
Prevention Science Aims and scope Submit manuscript

Abstract

In the analysis of prevention and intervention studies, it is often important to investigate whether treatment effects vary among subgroups of patients defined by individual characteristics. These “subgroup analyses” can provide information about how best to use a new prevention or intervention program. However, subgroup analyses can be misleading if they test data-driven hypotheses, employ inappropriate statistical methods, or fail to account for multiple testing. These problems have led to a general suspicion of findings from subgroup analyses. This article discusses sound methods for conducting subgroup analyses to detect moderators. Multiple authors have argued that, to assess whether a treatment effect varies across subgroups defined by patient characteristics, analyses should be based on tests for interaction rather than treatment comparisons within the subgroups. We discuss the concept of heterogeneity and its dependence on the metric used to describe treatment effects. We discuss issues of multiple comparisons related to subgroup analyses and the importance of considering multiplicity in the interpretation of results. We also discuss the types of questions that would lead to subgroup analyses and how different scientific goals may affect the study at the design stage. Finally, we discuss subgroup analyses based on post-baseline factors and the complexity associated with this type of subgroup analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Aguinis, H., & Gottfredson, R.K. (2010). Best-practice recommendations for estimating interaction effects using moderated multiple regression. Journal of Organizational Behavior, 31, 776–786. doi:10.1002/job.719.

    Article  Google Scholar 

  • Aiken, L.S., & West, S.G. (1991). Multiple regression: testing and interpreting interactions. Newbury Park, CA: Sage.

    Google Scholar 

  • Altman, D.G., & Andersen, K. (1999). Calculating the number needed to treat for trials where the outcome is time to an event. British Medical Journal, 319, 1492–1495. Retrieved from http://www.bmj.com/.

    Article  PubMed  CAS  Google Scholar 

  • Altman, D.G., Schulz, K.F., Moher, D., Egger, M., Davidoff, F., Elbourne, D., ... Lang, T. (2001). The revised CONSORT statement for reporting randomized trials: Explanation and elaboration. Annals of Internal Medicine, 134, 663–694. Retrieved from http://www.annals.org/.

    PubMed  CAS  Google Scholar 

  • Assmann, S.F., Hosmer, D.W., Lemeshow, S., & Mundt, K.A. (1996). Confidence intervals for measures of interactions. Epidemiology, 7, 286–290. doi:10.1097/00001648-199605000-00012.

    Article  PubMed  CAS  Google Scholar 

  • Assmann, S.F., Pocock, S.J., Enos, L.E., & Kasten, L.E. (2000). Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet, 355, 1064–1069. doi:10.1016/S0140-6736(00)02039-0.

    Article  PubMed  CAS  Google Scholar 

  • Baron, R.M., & Kenny, D.A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182. doi:a0020761/0022-3514.51.6.1173.

    Article  PubMed  CAS  Google Scholar 

  • Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57, 289–300. Retrieved from http://www.wiley.com/bw/journal.asp?ref=1369-7412&site=1.

    Google Scholar 

  • Bombardier, C., Laine, L., Reicin, A., Shapiro, D., Burgos-Vargas, R., Davis, B., ...Schnitzer, T.J. (2000). Comparison of upper gastrointestinal toxicity of rofecoxib and naproxen in patients with rheumatoid arthritis. New England Journal of medicine, 343, 1520–1528. doi:10.1056/NEJM200011233432103.

    Article  PubMed  CAS  Google Scholar 

  • Bonetti, M., & Gelber, R.D. (2000). A graphical method to assess treatment-covariate interactions using the Cox model on subsets of the data. Statistics in Medicine, 19, 2595–2609. doi:10.1002/1097-0258(20001015)19:19<2595::AIDSIM562>3.0.CO;2-M.

    Article  PubMed  CAS  Google Scholar 

  • Bonetti, M., & Gelber, R.D. (2004). Patterns of treatment effects in subsets of patients in clinical trials. Biostatistics, 5, 465–481. doi:10.1093/biostatistics/kxh002.

    Article  PubMed  Google Scholar 

  • Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1998). Classification and regression trees. Boca Raton, FL: Chapman & Hall/CRC.

    Google Scholar 

  • Byar, D. P. (1985). Assessing apparent treatment-covariate interactions in randomized clinical trials. Statistics in Medicine, 4, 255–263. doi:10.1002/sim.4780040304.

    Article  PubMed  CAS  Google Scholar 

  • Byar, D.P., & Green, S. (1980). The choice of treatment for cancer patients based on covariate information: Application to prostate cancer. Bulletin du Cancer, 67, 477–490. Retrieved from http://www.john-libbey eurotext.fr/en/revues/medecine/bdc/sommaire.md.

    PubMed  CAS  Google Scholar 

  • Cai, T., Tian, L., Wong, P.H., & Wei, L.J. (2010). Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics, Advance online publication. doi: 10.1093/biostatistics/kxq060

  • Cole, S.R., & Hernan, M.A. (2002). Fallibility in estimating direct effects. International Journal of Epidemiology, 31, 163–165. doi:10.1093/ije/31.1.163.

    Article  PubMed  Google Scholar 

  • Collins, L.M. (2009). Design of experiments with multiple independent variables: A resource management perspective on complete and reduced factorial designs. Psychological Methods., 14, 202–224. doi:a0020761/a0015826.

    Article  PubMed  Google Scholar 

  • Cook, R.J., & Sackett, D.L. (1995). The number needed to treat: A clinically useful measure of treatment effect. BMJ, 310, 452–454. Retrieved from http://www.bmj.com/.

    Article  PubMed  CAS  Google Scholar 

  • Curfman, G.D., Morrissey, S., & Drazen, J.M. (2005). Expression of concern: Bombardier et al., Comparison of upper gastrointestinal toxicity of rofecoxib and naproxen in patients with rheumatoid arthritis. N Engl J Med 2000;343:1520–8. New England Journal of Medicine, 353, 2813–14. doi: 10.1056/NEJMe058314

  • Gail, M., & Simon, R. (1985). Testing for qualitative interactions between treatment effects and patient subsets. Biometrics, 41, 361–372. doi:10.2307/2530862.

    Article  PubMed  CAS  Google Scholar 

  • Gardner, F., Connell, A., Trentacosta, C.J., Shaw, D.S., Dishion, T.J., & Wilson, M.N. (2009). Moderators of Outcome in a Brief Family-Centered Intervention for Preventing Early Problem Behavior. Journal of Consulting and Clinical Psychology, 77, 543–553. doi:a0020761/a0015622.

    Article  PubMed  Google Scholar 

  • Halperin, M., Ware, J.H., Byar, D.P., Mantel, N., Brown, C.C., Koziol, J., ...Green, S.B. (1977). Testing for interaction in an I × J × K contingency table. Biometrika, 64, 271–275. doi:10.2307/2335693.

    Google Scholar 

  • Hastie, T., & Tibshirani, R. (1990). Generalised additive models. Boca Raton, FL: Chapman and Hall/CRC.

    Google Scholar 

  • Hernández, A., Boersma, E., Murray, G.D., Habbema, J.D., & Steyerberg, E.W. (2006). Subgroup analyses in therapeutic cardiovascular clinical trials: Are most of them misleading? American Heart Journal, 151, 257–264. doi:10.1016/j.ahj.2005.04.020.

    Article  PubMed  Google Scholar 

  • Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800–802. doi:10.1093/biomet/75.4.800.

    Article  Google Scholar 

  • Holm, S. (1979). A simple sequential rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65–70. Retrieved from http://www.blackwellpublishing.com/journal.asp?ref=0303-6898.

    Google Scholar 

  • Hommel, G. (1988). A stagewise rejective multiple test procedure on a modified Bonferroni test. Biometrika, 75, 383–386. doi:10.1093/biomet/75.2.383.

    Article  Google Scholar 

  • Hosmer, D.W., & Lemeshow, S. (1992). Confidence interval estimation of interaction. Epidemiology, 3, 452–456. doi:10.1097/00001648-199209000-00012.

    Article  PubMed  CAS  Google Scholar 

  • Imai, K., Keele, L., & Tingley, D. (2010). A general approach to causal mediation analysis. Psychological Methods, 15, 309–334. doi::a0020761/a0020761.

    Article  PubMed  Google Scholar 

  • Jackson, R.D., LaCroix, A.Z., Gass, M., Wallace, R.B., Robbins, J., Lewis, C.E., ...Barad, D. (2006). Calcium plus vitamin D supplementation and the risk of fractures. New England Journal of Medicine, 354, 669–683. doi:10.1056/NEJMoa055218 [Erratum, N Engl J Med 2006; 354:1102].

    Article  PubMed  CAS  Google Scholar 

  • Jo, B. (2008). Causal inference in randomized experiments with mediational processes. Psychological Methods, 13, 314–336. doi:a0020761/a0014207.

    Article  PubMed  Google Scholar 

  • Judd, C.M., & Kenny, D.A. (1981). Process analysis: Estimating mediation in treatment evaluations. Evaluation Review, 5, 602–619. doi:10.1177/0193841X8100500502.

    Article  Google Scholar 

  • Julius, S., Nesbitt, S.D., Egan, B.M., Weber, M.A., Michelson, E.L., Kaciroti, N.,.... Schork, M.A. (2006). Feasibility of treating prehypertension with an angiotension-receptor blocker. New England Journal of Medicine, 354, 1685–1697. doi:10.1056/NEJMoa060838.

    Article  PubMed  CAS  Google Scholar 

  • Kent, D.M., & Hayward, R.A. (2007). Limitations of applying summary results of clinical trials to individual patients, the need for risk stratification. Journal of American Medical Association, 298, 1209–1212. doi:10.1001/jama.298.10.1209.

    Article  CAS  Google Scholar 

  • Keppel, G., & Wickens, T.D. (2004). Design and analysis: A researcher’s handbook. Upper Saddle River, NJ: Pearson/Prentice Hall.

    Google Scholar 

  • Koch, G.G., & Gansky, S.A. (1996). Statistical considerations for multiplicity in confirmatory protocols. Drug Information Journal, 30, 523–533. Retrieved from http://www.diahome.org/DIAHome/Resources/FindPublications.aspx.

    Article  Google Scholar 

  • Kraemer, H.C. (2004). Reconsidering the odds ratio as a measure of 2 × 2 association in a population. Statistics in Medicine, 23, 257–270. doi:10.1002/sim.1714.

    Article  PubMed  Google Scholar 

  • Kraemer, H.C. (2006). Moderators of treatment outcomes: Clinical, research, and policy importance. Journal of the American Medical Association, 296, 1–4. doi:10.1001/jama.296.10.1286.

    Article  Google Scholar 

  • Kraemer, H.C. (2008). Toward non-parametric and clinically meaningful moderators and mediators. Statistics in Medicine, 27, 1679–1692. doi:10.1002/sim.3149.

    Article  PubMed  Google Scholar 

  • Kraemer, H.C., Wilson, T., Fairburn, C. G., & Agras, W. S. (2002). Mediators and moderators of treatment effects in randomized clinical trials. Archives of General Psychiatry, 59, 877–883. doi:10.1001/archpsyc.59.10.877.

    Article  PubMed  Google Scholar 

  • Kraemer, H.C., Kiernan, M., Essex, M., & Kupfer, D. J. (2008). How and why criteria defining moderators and mediators differ between the Baron & Kenny and the MacArthur approaches. Health Psychology, 27, S101–S108. Retrieved from http://www.apa.org/pubs/journals/hea/.

    Article  PubMed  Google Scholar 

  • Lagakos, S.W. (2006). The challenge of subgroup analyses—reporting without distorting. New England Journal of Medicine, 354, 1667–1669. doi:10.1056/NEJMp068070.

    Article  PubMed  CAS  Google Scholar 

  • Lemon, S.C., Roy, J., Clark, M.A., Friedmann, P.D., & Rakowski, W. (2003). Classification and regression tree analysis in public health: Methodological review and comparison with logistic regression. Annals of Behavioral Medicine, 26, 172–181. doi:10.1207/S15324796ABM2603_02.

    Article  PubMed  Google Scholar 

  • Li, R., & Chambless, L. (2007). Test for additive interaction in proportional hazards models. Annals of Epidemiology, 17, 227–236. doi:10.1016/j.annepidem.2006.10.009.

    Article  PubMed  Google Scholar 

  • MacCallum, R.C., Zhang, S., Preacher, K.J., & Rucker, D.D. (2002). On the practice of dichotomization of quantitative variables. Psychological Methods, 7, 19–40. doi:a0020761/1082-989X.7.1.19.

    Article  PubMed  Google Scholar 

  • MacKinnon, D.P. (2008). Introduction to statistical mediation analysis. New York, NY: Taylor & Francis Group.

    Google Scholar 

  • MacKinnon, D.P., & Dwyer, J.H. (1993). Estimating mediated effects in prevention studies. Evaluation Review, 17, 144–158. doi:10.1177/0193841X9301700202.

    Article  Google Scholar 

  • Marra, G., & Radice, R. (2010). Penalised regression splines: Theory and application to medical research. Statistical Methods in Medical Research, 19, 107–125. doi:10.1177/0962280208096688.

    Article  PubMed  Google Scholar 

  • Meckstroth, A., Burwick, A., Moore, Q., Ponza, M., Marsh, S., McGuirk, A., Zhao, Z. (2008). Teaching self-sufficiency: An impact and benefit-cost analysis of a home visitation and life skills education program. Retrieved from Mathematics Policy Research website: http://www.mathematica-mpr.com/publications/pdfs/teaching_self.pdf

  • Newcombe, R.G. (2006). A deficiency of the odds ratio as a measure of effect size. Statistics in Medicine, 25, 4235–4240. doi:10.1002/sim.2683.

    Article  PubMed  Google Scholar 

  • Pan, G., & Wolfe, D.A. (1997). Test for qualitative interaction of clinical significance. Statistics in Medicine, 16, 1645–1652. doi:10.1002/(SICI)1097-0258(19970730)16:14<1645::AID-SIM596>3.0.CO;2-G.

    Article  PubMed  CAS  Google Scholar 

  • Patel, K.M., & Hoel, D.G. (1973). A nonparametric test for interaction in factorial experiments. Journal of the American Statistical Association, 68, 615–620. doi:10.2307/2284788.

    Article  Google Scholar 

  • Pearl, J. (2001). Direct and indirect effects. Proceedings of the Seventeenth Conference on Uncertainty and Artificial Intelligence, 411–20. San Francisco: Morgan Kaufmann.

  • Peto, R. (1982). Statistical aspects of cancer trials. In K. E. Halnan (Ed.), Treatment of Cancer (pp. 867–871). London: Chapman and Hall.

    Google Scholar 

  • Piantadosi, S., & Gail, M.H. (1993). A comparison of the power of two tests for qualitative interactions. Statistics in Medicine, 12, 1239–1248. doi:10.1002/sim.4780121105.

    Article  PubMed  CAS  Google Scholar 

  • Robins, J.M., & Greenland, S. (1992). Identifiabilty and exchangeability for direct and indirect effects. Epidemiology, 3, 143–155. doi:10.1097/00001648-199203000-00013.

    Article  PubMed  CAS  Google Scholar 

  • Rothman, K.J. (1986). Modern Epidemiology. Boston, MA: Little, Brown and Company.

    Google Scholar 

  • Sackett, D.L. (1996). Down with odds ratios! Evidence-Based Medicine, 1, 164–166. doi:10.1629/09178.

    Google Scholar 

  • Sacks, F.M., Pfeffer, M.A., Moye, L.A., Rouleau, J.L., Rutherford, J.D., Cole, T.G.,... Braunwald, E. (1996). The effect of Pravastatin on coronary events after Myocardial infarction in patients with average cholesterol levels. The New England Journal of Medicine, 335, 1001–1009. doi:10.1056/NEJM199610033351401.

    Article  PubMed  CAS  Google Scholar 

  • Schemper, M. (1988). Non-parametric analysis of treatment-covariate interaction in the presence of censoring. Statistics in Medicine, 7, 1257–1266. doi:10.1002/sim.4780071206.

    Article  PubMed  CAS  Google Scholar 

  • Schwartz, L.M., Woloshin, S., & Welch, H.G. (1999). Misunderstandings about the effects of race and sex on physicians’ referrals for cardiac catheterization. New England Journal of Medicine, 341, 279–283. doi:10.1056/NEJM199907223410411.

    Article  PubMed  CAS  Google Scholar 

  • Shaffer, J.P. (1995). Multiple Hypothesis Testing. Annual Review of Psychology, 46, 561–584. doi:10.1146/annurev.ps.46.020195.003021.

    Article  Google Scholar 

  • Shuster, J., & van Eys, J. (1983). Interaction between prognostic factors and treatment. Controlled Clinical Trials, 4, 209–214. doi:10.1016/0197-2456(83)90004-1.

    PubMed  CAS  Google Scholar 

  • Silvapulle, M.J. (2001). Tests against qualitative interaction: Exact critical values and robust tests. Biometrics, 57, 1157–1165. doi:10.1111/j.0006-341X.2001.01157.x.

    Article  PubMed  CAS  Google Scholar 

  • Simes, J.R. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika, 73, 751–754. doi:10.1093/biomet/73.3.751.

    Article  Google Scholar 

  • Sleeper, L.A., & Harrington, D.P. (1990). Regression splines in the Cox model with application to covariate effects in liver disease. Journal of the American Statistical Association, 85, 941–949. doi:10.2307/2289591.

    Article  Google Scholar 

  • Sobel, M.E. (2008). Identification of causal parameters in randomized studies with mediating variables. Journal of Educational and Behavioral Statistics, 33, 230–251. doi:10.3102/1076998607307239.

    Article  Google Scholar 

  • Song, S., & Pepe, M.S. (2004). Evaluating markers for selecting a patient’s treatment. Biometrics, 60, 874–883. doi:10.1111/j.0006-341X.2004.00242.x.

    Article  PubMed  Google Scholar 

  • Storey, J.D., & Tibshirani, R. (2003). Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences of the United States of America, 100, 9440–9445. doi:10.1073/pnas.1530509100.

    Article  PubMed  CAS  Google Scholar 

  • Tolan, P.H., Gorman-Smith, D., Henry, D., & Schoney, M. (2009). The Benefits of Booster Interventions: Evidence from a Family-Focused Prevention Program. Prevention Science, 10, 287–297. doi:10.1007/s11121-009-0139-8.

    Article  PubMed  Google Scholar 

  • Van den Berghe, G., Wilmer, A., Hermans, G., Meersseman, W., Wouters, P.J., Milants, L., ... Bouillon, R. (2006). Intensive Insulin Therapy in the Medical ICU. New England Journal of Medicine, 354, 449–461. doi:10.1056/NEJMoa052521.

    Article  PubMed  Google Scholar 

  • VanderWeele, T.J. (2010). Bias formulas for sensitivity analysis for direct and indirect effects. Epidemiology, 21, 540–551. doi:10.1097/EDE.0b013e3181df191c.

    Article  PubMed  Google Scholar 

  • VanderWeele, T.J., & Knol, M.J. (2011). The interpretation of subgroup analyses in randomized trials: Heterogeneity versus secondary interventions. Annals of Internal Medicine, in press.

  • VanderWeele, T.J., & Robins, J.M. (2007). Four types of effect modification: A classification based on directed acyclic graphs. Epidemiology, 18, 561–568. doi:10.1097/EDE.0b013e318127181b.

    Article  PubMed  Google Scholar 

  • VanderWeele, T.J., & Vansteelandt, S. (2009). Conceptual issues concerning mediation, interventions and composition. Statistics and Its Interface, 2, 457–468. Retrieved from http://www.intlpress.com/SII/.

    Google Scholar 

  • VanderWeele, T.J., & Vansteelandt, S. (2010). Odds ratios for mediation analysis with a dichotomous outcome. American Journal of Epidemiology, 172, 1339–1348. doi:10.1093/aje/kwq332.

    Article  PubMed  Google Scholar 

  • Wactawski-Wende, J., Kotchen, J.M., Anderson, G.L., Assaf, A.R., Brunner, R.L., O’Sullivan, M.J., ... Manson, E. (2006). Calcium plus vitamin D supplementation and the risk of colorectal cancer. New England Journal of Medicine, 354, 684–696. doi:10.1056/NEJMoa055222.

    Article  PubMed  CAS  Google Scholar 

  • Wang, R., Lagakos, S.W., Ware, H., Hunter, D.J., & Drazen, J.M. (2007). Statistics in medicine—reporting of subgroup analyses in clinical trials. New England Journal of Medicine, 357, 2189–2194. doi:10.1056/NEJMsr077003.

    Article  PubMed  CAS  Google Scholar 

  • Wen, L., Badgett, R., & Cornell, J. (2005). Number needed to treat: A descriptor for weighing therapeutic options. American Journal of Health-System Pharmacology, 62, 2031–2036. doi:10.2146/ajhp040558.

    Article  Google Scholar 

Download references

Acknowledgment

We dedicate this paper to our friend and colleague, Dr. Stephen W. Lagakos, who inspired the work and provided valuable insights and discussions on many aspects of subgroup analyses. We are grateful to Drs. Robert J. McMahon, David P. Mackinnon, Tyler VanderWeele, and three reviewers for their comments, which have improved the paper. This work was in part supported by grant AI24643 from the National Institutes of Health.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, R., Ware, J.H. Detecting Moderator Effects Using Subgroup Analyses. Prev Sci 14, 111–120 (2013). https://doi.org/10.1007/s11121-011-0221-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11121-011-0221-x

Keywords

Navigation