Prevention Science

, Volume 14, Issue 2, pp 111–120 | Cite as

Detecting Moderator Effects Using Subgroup Analyses

  • Rui WangEmail author
  • James H. Ware


In the analysis of prevention and intervention studies, it is often important to investigate whether treatment effects vary among subgroups of patients defined by individual characteristics. These “subgroup analyses” can provide information about how best to use a new prevention or intervention program. However, subgroup analyses can be misleading if they test data-driven hypotheses, employ inappropriate statistical methods, or fail to account for multiple testing. These problems have led to a general suspicion of findings from subgroup analyses. This article discusses sound methods for conducting subgroup analyses to detect moderators. Multiple authors have argued that, to assess whether a treatment effect varies across subgroups defined by patient characteristics, analyses should be based on tests for interaction rather than treatment comparisons within the subgroups. We discuss the concept of heterogeneity and its dependence on the metric used to describe treatment effects. We discuss issues of multiple comparisons related to subgroup analyses and the importance of considering multiplicity in the interpretation of results. We also discuss the types of questions that would lead to subgroup analyses and how different scientific goals may affect the study at the design stage. Finally, we discuss subgroup analyses based on post-baseline factors and the complexity associated with this type of subgroup analysis.


Moderator Subgroup analysis Heterogeneity Interaction Subset 



We dedicate this paper to our friend and colleague, Dr. Stephen W. Lagakos, who inspired the work and provided valuable insights and discussions on many aspects of subgroup analyses. We are grateful to Drs. Robert J. McMahon, David P. Mackinnon, Tyler VanderWeele, and three reviewers for their comments, which have improved the paper. This work was in part supported by grant AI24643 from the National Institutes of Health.


  1. Aguinis, H., & Gottfredson, R.K. (2010). Best-practice recommendations for estimating interaction effects using moderated multiple regression. Journal of Organizational Behavior, 31, 776–786. doi: 10.1002/job.719.CrossRefGoogle Scholar
  2. Aiken, L.S., & West, S.G. (1991). Multiple regression: testing and interpreting interactions. Newbury Park, CA: Sage.Google Scholar
  3. Altman, D.G., & Andersen, K. (1999). Calculating the number needed to treat for trials where the outcome is time to an event. British Medical Journal, 319, 1492–1495. Retrieved from Scholar
  4. Altman, D.G., Schulz, K.F., Moher, D., Egger, M., Davidoff, F., Elbourne, D., ... Lang, T. (2001). The revised CONSORT statement for reporting randomized trials: Explanation and elaboration. Annals of Internal Medicine, 134, 663–694. Retrieved from Scholar
  5. Assmann, S.F., Hosmer, D.W., Lemeshow, S., & Mundt, K.A. (1996). Confidence intervals for measures of interactions. Epidemiology, 7, 286–290. doi: 10.1097/00001648-199605000-00012.PubMedCrossRefGoogle Scholar
  6. Assmann, S.F., Pocock, S.J., Enos, L.E., & Kasten, L.E. (2000). Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet, 355, 1064–1069. doi: 10.1016/S0140-6736(00)02039-0.PubMedCrossRefGoogle Scholar
  7. Baron, R.M., & Kenny, D.A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173–1182. doi: a0020761/0022-3514.51.6.1173.PubMedCrossRefGoogle Scholar
  8. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B, 57, 289–300. Retrieved from Scholar
  9. Bombardier, C., Laine, L., Reicin, A., Shapiro, D., Burgos-Vargas, R., Davis, B., ...Schnitzer, T.J. (2000). Comparison of upper gastrointestinal toxicity of rofecoxib and naproxen in patients with rheumatoid arthritis. New England Journal of medicine, 343, 1520–1528. doi: 10.1056/NEJM200011233432103.PubMedCrossRefGoogle Scholar
  10. Bonetti, M., & Gelber, R.D. (2000). A graphical method to assess treatment-covariate interactions using the Cox model on subsets of the data. Statistics in Medicine, 19, 2595–2609. doi: 10.1002/1097-0258(20001015)19:19<2595::AIDSIM562>3.0.CO;2-M.PubMedCrossRefGoogle Scholar
  11. Bonetti, M., & Gelber, R.D. (2004). Patterns of treatment effects in subsets of patients in clinical trials. Biostatistics, 5, 465–481. doi: 10.1093/biostatistics/kxh002.PubMedCrossRefGoogle Scholar
  12. Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1998). Classification and regression trees. Boca Raton, FL: Chapman & Hall/CRC.Google Scholar
  13. Byar, D. P. (1985). Assessing apparent treatment-covariate interactions in randomized clinical trials. Statistics in Medicine, 4, 255–263. doi: 10.1002/sim.4780040304.PubMedCrossRefGoogle Scholar
  14. Byar, D.P., & Green, S. (1980). The choice of treatment for cancer patients based on covariate information: Application to prostate cancer. Bulletin du Cancer, 67, 477–490. Retrieved from http://www.john-libbey Scholar
  15. Cai, T., Tian, L., Wong, P.H., & Wei, L.J. (2010). Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics, Advance online publication. doi: 10.1093/biostatistics/kxq060
  16. Cole, S.R., & Hernan, M.A. (2002). Fallibility in estimating direct effects. International Journal of Epidemiology, 31, 163–165. doi: 10.1093/ije/31.1.163.PubMedCrossRefGoogle Scholar
  17. Collins, L.M. (2009). Design of experiments with multiple independent variables: A resource management perspective on complete and reduced factorial designs. Psychological Methods., 14, 202–224. doi: a0020761/a0015826.PubMedCrossRefGoogle Scholar
  18. Cook, R.J., & Sackett, D.L. (1995). The number needed to treat: A clinically useful measure of treatment effect. BMJ, 310, 452–454. Retrieved from Scholar
  19. Curfman, G.D., Morrissey, S., & Drazen, J.M. (2005). Expression of concern: Bombardier et al., Comparison of upper gastrointestinal toxicity of rofecoxib and naproxen in patients with rheumatoid arthritis. N Engl J Med 2000;343:1520–8. New England Journal of Medicine, 353, 2813–14. doi: 10.1056/NEJMe058314
  20. Gail, M., & Simon, R. (1985). Testing for qualitative interactions between treatment effects and patient subsets. Biometrics, 41, 361–372. doi: 10.2307/2530862.PubMedCrossRefGoogle Scholar
  21. Gardner, F., Connell, A., Trentacosta, C.J., Shaw, D.S., Dishion, T.J., & Wilson, M.N. (2009). Moderators of Outcome in a Brief Family-Centered Intervention for Preventing Early Problem Behavior. Journal of Consulting and Clinical Psychology, 77, 543–553. doi: a0020761/a0015622.PubMedCrossRefGoogle Scholar
  22. Halperin, M., Ware, J.H., Byar, D.P., Mantel, N., Brown, C.C., Koziol, J., ...Green, S.B. (1977). Testing for interaction in an I × J × K contingency table. Biometrika, 64, 271–275. doi: 10.2307/2335693.Google Scholar
  23. Hastie, T., & Tibshirani, R. (1990). Generalised additive models. Boca Raton, FL: Chapman and Hall/CRC.Google Scholar
  24. Hernández, A., Boersma, E., Murray, G.D., Habbema, J.D., & Steyerberg, E.W. (2006). Subgroup analyses in therapeutic cardiovascular clinical trials: Are most of them misleading? American Heart Journal, 151, 257–264. doi: 10.1016/j.ahj.2005.04.020.PubMedCrossRefGoogle Scholar
  25. Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800–802. doi: 10.1093/biomet/75.4.800.CrossRefGoogle Scholar
  26. Holm, S. (1979). A simple sequential rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65–70. Retrieved from Scholar
  27. Hommel, G. (1988). A stagewise rejective multiple test procedure on a modified Bonferroni test. Biometrika, 75, 383–386. doi: 10.1093/biomet/75.2.383.CrossRefGoogle Scholar
  28. Hosmer, D.W., & Lemeshow, S. (1992). Confidence interval estimation of interaction. Epidemiology, 3, 452–456. doi: 10.1097/00001648-199209000-00012.PubMedCrossRefGoogle Scholar
  29. Imai, K., Keele, L., & Tingley, D. (2010). A general approach to causal mediation analysis. Psychological Methods, 15, 309–334. doi: :a0020761/a0020761.PubMedCrossRefGoogle Scholar
  30. Jackson, R.D., LaCroix, A.Z., Gass, M., Wallace, R.B., Robbins, J., Lewis, C.E., ...Barad, D. (2006). Calcium plus vitamin D supplementation and the risk of fractures. New England Journal of Medicine, 354, 669–683. doi: 10.1056/NEJMoa055218 [Erratum, N Engl J Med 2006; 354:1102].PubMedCrossRefGoogle Scholar
  31. Jo, B. (2008). Causal inference in randomized experiments with mediational processes. Psychological Methods, 13, 314–336. doi: a0020761/a0014207.PubMedCrossRefGoogle Scholar
  32. Judd, C.M., & Kenny, D.A. (1981). Process analysis: Estimating mediation in treatment evaluations. Evaluation Review, 5, 602–619. doi: 10.1177/0193841X8100500502.CrossRefGoogle Scholar
  33. Julius, S., Nesbitt, S.D., Egan, B.M., Weber, M.A., Michelson, E.L., Kaciroti, N.,.... Schork, M.A. (2006). Feasibility of treating prehypertension with an angiotension-receptor blocker. New England Journal of Medicine, 354, 1685–1697. doi: 10.1056/NEJMoa060838.PubMedCrossRefGoogle Scholar
  34. Kent, D.M., & Hayward, R.A. (2007). Limitations of applying summary results of clinical trials to individual patients, the need for risk stratification. Journal of American Medical Association, 298, 1209–1212. doi: 10.1001/jama.298.10.1209.CrossRefGoogle Scholar
  35. Keppel, G., & Wickens, T.D. (2004). Design and analysis: A researcher’s handbook. Upper Saddle River, NJ: Pearson/Prentice Hall.Google Scholar
  36. Koch, G.G., & Gansky, S.A. (1996). Statistical considerations for multiplicity in confirmatory protocols. Drug Information Journal, 30, 523–533. Retrieved from Scholar
  37. Kraemer, H.C. (2004). Reconsidering the odds ratio as a measure of 2 × 2 association in a population. Statistics in Medicine, 23, 257–270. doi: 10.1002/sim.1714.PubMedCrossRefGoogle Scholar
  38. Kraemer, H.C. (2006). Moderators of treatment outcomes: Clinical, research, and policy importance. Journal of the American Medical Association, 296, 1–4. doi: 10.1001/jama.296.10.1286.CrossRefGoogle Scholar
  39. Kraemer, H.C. (2008). Toward non-parametric and clinically meaningful moderators and mediators. Statistics in Medicine, 27, 1679–1692. doi: 10.1002/sim.3149.PubMedCrossRefGoogle Scholar
  40. Kraemer, H.C., Wilson, T., Fairburn, C. G., & Agras, W. S. (2002). Mediators and moderators of treatment effects in randomized clinical trials. Archives of General Psychiatry, 59, 877–883. doi: 10.1001/archpsyc.59.10.877.PubMedCrossRefGoogle Scholar
  41. Kraemer, H.C., Kiernan, M., Essex, M., & Kupfer, D. J. (2008). How and why criteria defining moderators and mediators differ between the Baron & Kenny and the MacArthur approaches. Health Psychology, 27, S101–S108. Retrieved from Scholar
  42. Lagakos, S.W. (2006). The challenge of subgroup analyses—reporting without distorting. New England Journal of Medicine, 354, 1667–1669. doi: 10.1056/NEJMp068070.PubMedCrossRefGoogle Scholar
  43. Lemon, S.C., Roy, J., Clark, M.A., Friedmann, P.D., & Rakowski, W. (2003). Classification and regression tree analysis in public health: Methodological review and comparison with logistic regression. Annals of Behavioral Medicine, 26, 172–181. doi: 10.1207/S15324796ABM2603_02.PubMedCrossRefGoogle Scholar
  44. Li, R., & Chambless, L. (2007). Test for additive interaction in proportional hazards models. Annals of Epidemiology, 17, 227–236. doi: 10.1016/j.annepidem.2006.10.009.PubMedCrossRefGoogle Scholar
  45. MacCallum, R.C., Zhang, S., Preacher, K.J., & Rucker, D.D. (2002). On the practice of dichotomization of quantitative variables. Psychological Methods, 7, 19–40. doi: a0020761/1082-989X.7.1.19.PubMedCrossRefGoogle Scholar
  46. MacKinnon, D.P. (2008). Introduction to statistical mediation analysis. New York, NY: Taylor & Francis Group.Google Scholar
  47. MacKinnon, D.P., & Dwyer, J.H. (1993). Estimating mediated effects in prevention studies. Evaluation Review, 17, 144–158. doi: 10.1177/0193841X9301700202.CrossRefGoogle Scholar
  48. Marra, G., & Radice, R. (2010). Penalised regression splines: Theory and application to medical research. Statistical Methods in Medical Research, 19, 107–125. doi: 10.1177/0962280208096688.PubMedCrossRefGoogle Scholar
  49. Meckstroth, A., Burwick, A., Moore, Q., Ponza, M., Marsh, S., McGuirk, A., Zhao, Z. (2008). Teaching self-sufficiency: An impact and benefit-cost analysis of a home visitation and life skills education program. Retrieved from Mathematics Policy Research website:
  50. Newcombe, R.G. (2006). A deficiency of the odds ratio as a measure of effect size. Statistics in Medicine, 25, 4235–4240. doi: 10.1002/sim.2683.PubMedCrossRefGoogle Scholar
  51. Pan, G., & Wolfe, D.A. (1997). Test for qualitative interaction of clinical significance. Statistics in Medicine, 16, 1645–1652. doi: 10.1002/(SICI)1097-0258(19970730)16:14<1645::AID-SIM596>3.0.CO;2-G.PubMedCrossRefGoogle Scholar
  52. Patel, K.M., & Hoel, D.G. (1973). A nonparametric test for interaction in factorial experiments. Journal of the American Statistical Association, 68, 615–620. doi: 10.2307/2284788.CrossRefGoogle Scholar
  53. Pearl, J. (2001). Direct and indirect effects. Proceedings of the Seventeenth Conference on Uncertainty and Artificial Intelligence, 411–20. San Francisco: Morgan Kaufmann.Google Scholar
  54. Peto, R. (1982). Statistical aspects of cancer trials. In K. E. Halnan (Ed.), Treatment of Cancer (pp. 867–871). London: Chapman and Hall.Google Scholar
  55. Piantadosi, S., & Gail, M.H. (1993). A comparison of the power of two tests for qualitative interactions. Statistics in Medicine, 12, 1239–1248. doi: 10.1002/sim.4780121105.PubMedCrossRefGoogle Scholar
  56. Robins, J.M., & Greenland, S. (1992). Identifiabilty and exchangeability for direct and indirect effects. Epidemiology, 3, 143–155. doi: 10.1097/00001648-199203000-00013.PubMedCrossRefGoogle Scholar
  57. Rothman, K.J. (1986). Modern Epidemiology. Boston, MA: Little, Brown and Company.Google Scholar
  58. Sackett, D.L. (1996). Down with odds ratios! Evidence-Based Medicine, 1, 164–166. doi: 10.1629/09178.Google Scholar
  59. Sacks, F.M., Pfeffer, M.A., Moye, L.A., Rouleau, J.L., Rutherford, J.D., Cole, T.G.,... Braunwald, E. (1996). The effect of Pravastatin on coronary events after Myocardial infarction in patients with average cholesterol levels. The New England Journal of Medicine, 335, 1001–1009. doi: 10.1056/NEJM199610033351401.PubMedCrossRefGoogle Scholar
  60. Schemper, M. (1988). Non-parametric analysis of treatment-covariate interaction in the presence of censoring. Statistics in Medicine, 7, 1257–1266. doi: 10.1002/sim.4780071206.PubMedCrossRefGoogle Scholar
  61. Schwartz, L.M., Woloshin, S., & Welch, H.G. (1999). Misunderstandings about the effects of race and sex on physicians’ referrals for cardiac catheterization. New England Journal of Medicine, 341, 279–283. doi: 10.1056/NEJM199907223410411.PubMedCrossRefGoogle Scholar
  62. Shaffer, J.P. (1995). Multiple Hypothesis Testing. Annual Review of Psychology, 46, 561–584. doi: 10.1146/ Scholar
  63. Shuster, J., & van Eys, J. (1983). Interaction between prognostic factors and treatment. Controlled Clinical Trials, 4, 209–214. doi: 10.1016/0197-2456(83)90004-1.PubMedGoogle Scholar
  64. Silvapulle, M.J. (2001). Tests against qualitative interaction: Exact critical values and robust tests. Biometrics, 57, 1157–1165. doi: 10.1111/j.0006-341X.2001.01157.x.PubMedCrossRefGoogle Scholar
  65. Simes, J.R. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika, 73, 751–754. doi: 10.1093/biomet/73.3.751.CrossRefGoogle Scholar
  66. Sleeper, L.A., & Harrington, D.P. (1990). Regression splines in the Cox model with application to covariate effects in liver disease. Journal of the American Statistical Association, 85, 941–949. doi: 10.2307/2289591.CrossRefGoogle Scholar
  67. Sobel, M.E. (2008). Identification of causal parameters in randomized studies with mediating variables. Journal of Educational and Behavioral Statistics, 33, 230–251. doi: 10.3102/1076998607307239.CrossRefGoogle Scholar
  68. Song, S., & Pepe, M.S. (2004). Evaluating markers for selecting a patient’s treatment. Biometrics, 60, 874–883. doi: 10.1111/j.0006-341X.2004.00242.x.PubMedCrossRefGoogle Scholar
  69. Storey, J.D., & Tibshirani, R. (2003). Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences of the United States of America, 100, 9440–9445. doi: 10.1073/pnas.1530509100.PubMedCrossRefGoogle Scholar
  70. Tolan, P.H., Gorman-Smith, D., Henry, D., & Schoney, M. (2009). The Benefits of Booster Interventions: Evidence from a Family-Focused Prevention Program. Prevention Science, 10, 287–297. doi: 10.1007/s11121-009-0139-8.PubMedCrossRefGoogle Scholar
  71. Van den Berghe, G., Wilmer, A., Hermans, G., Meersseman, W., Wouters, P.J., Milants, L., ... Bouillon, R. (2006). Intensive Insulin Therapy in the Medical ICU. New England Journal of Medicine, 354, 449–461. doi: 10.1056/NEJMoa052521.PubMedCrossRefGoogle Scholar
  72. VanderWeele, T.J. (2010). Bias formulas for sensitivity analysis for direct and indirect effects. Epidemiology, 21, 540–551. doi: 10.1097/EDE.0b013e3181df191c.PubMedCrossRefGoogle Scholar
  73. VanderWeele, T.J., & Knol, M.J. (2011). The interpretation of subgroup analyses in randomized trials: Heterogeneity versus secondary interventions. Annals of Internal Medicine, in press.Google Scholar
  74. VanderWeele, T.J., & Robins, J.M. (2007). Four types of effect modification: A classification based on directed acyclic graphs. Epidemiology, 18, 561–568. doi: 10.1097/EDE.0b013e318127181b.PubMedCrossRefGoogle Scholar
  75. VanderWeele, T.J., & Vansteelandt, S. (2009). Conceptual issues concerning mediation, interventions and composition. Statistics and Its Interface, 2, 457–468. Retrieved from Scholar
  76. VanderWeele, T.J., & Vansteelandt, S. (2010). Odds ratios for mediation analysis with a dichotomous outcome. American Journal of Epidemiology, 172, 1339–1348. doi: 10.1093/aje/kwq332.PubMedCrossRefGoogle Scholar
  77. Wactawski-Wende, J., Kotchen, J.M., Anderson, G.L., Assaf, A.R., Brunner, R.L., O’Sullivan, M.J., ... Manson, E. (2006). Calcium plus vitamin D supplementation and the risk of colorectal cancer. New England Journal of Medicine, 354, 684–696. doi: 10.1056/NEJMoa055222.PubMedCrossRefGoogle Scholar
  78. Wang, R., Lagakos, S.W., Ware, H., Hunter, D.J., & Drazen, J.M. (2007). Statistics in medicine—reporting of subgroup analyses in clinical trials. New England Journal of Medicine, 357, 2189–2194. doi: 10.1056/NEJMsr077003.PubMedCrossRefGoogle Scholar
  79. Wen, L., Badgett, R., & Cornell, J. (2005). Number needed to treat: A descriptor for weighing therapeutic options. American Journal of Health-System Pharmacology, 62, 2031–2036. doi: 10.2146/ajhp040558.CrossRefGoogle Scholar

Copyright information

© Society for Prevention Research 2011

Authors and Affiliations

  1. 1.Department of BiostatisticsHarvard School of Public HealthBostonUSA

Personalised recommendations