Equivalence Tests in Subgroup Analyses

  • A. Ring
  • M. Scharpenberg
  • S. Grill
  • R. Schall
  • W. Brannath
Part of the ICSA Book Series in Statistics book series (ICSABSS)


Confirmatory clinical trials that aim to demonstrate the efficacy of drugs are typically performed in broad patient populations so that the patient population is usually heterogeneous with respect to demographic variables and medical conditions. Therefore, regulatory guidelines request that, in addition to the primary comparison of the treatment effects in the total study population, the consistency of the treatment effect be evaluated across medically relevant subgroups (e.g. gender, age or comorbidities).

We propose that the consistency of the treatment effect in two subgroups should be assessed using an equivalence test, which in the current context we call consistency test. The proposed tests compare the treatment contrasts in the two subgroups, aiming to reject the null hypothesis of heterogeneity.

We present tests for both quantitative and binary outcome variables. While the details of these tests differ for the two types of outcome variable, both tests are based on a generalised linear model in which treatment, subgroup, and subgroup-by-treatment interaction terms are fitted.

In this text, we review the basic properties of these consistency tests using Monte-Carlo simulations. A key objective of these simulations is to suggest suitable equivalence margins, based on the performance of the tests in various settings. The investigation indicates that equivalence tests can be used both to assess the consistency of treatment effects across subgroups and to detect medically relevant heterogeneity in treatment effects across subgroups.


Linear model Statistical interaction Subgroup analysis Binary endpoint Similarity Homogeneity Consistency 


  1. Bath, P. M., Martin, R. H., Palesch, Y., Cotton, D., Yusuf, S., Sacco, R., Diener, H. C., Toni, D., Estol, C., & Roberts, R. (2009). Effect of telmisartan on functional outcome, recurrence, and blood pressure in patients with acute mild ischemic stroke: A PRoFESS subgroup analysis. Stroke., 40(11), 3541–3546. Scholar
  2. Beeh, K. M., Westerman, J., Kirsten, A. M., Hébert, J., Grönke, L., Hamilton, A., Tetzlaff, K., & Derom, E. (2015). The 24-h lung-function profile of once-daily tiotropium and olodaterol fixed-dose combination in chronic obstructive pulmonary disease. Pulm Pharmacol Ther., 32, 53–59. Scholar
  3. Bretz, F., Maurer, W., Brannath, W., & Posch, M. (2009). A graphical approach to sequentially rejective multiple test procedures. Stat Med., 28(4), 586–604. Scholar
  4. Brookes, S. T., Whitely, E., Egger, M., Smith, G. D., Mulheran, P. A., & Peters, T. J. (2001). Subgroup analyses in randomised controlled trials: Quantifying the risks of false-positives and false-negatives. Health Technol Assess, 5(33), 1–56.CrossRefGoogle Scholar
  5. Brookes, S. T., Whitely, E., Egger, M., et al. (2004). Subgroup analyses in randomized trials: Risks of subgroup-specific analyses; power and sample size for the interaction test. J Clin Epidemiol, 57(3), 229–236.CrossRefGoogle Scholar
  6. CDER. Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER). (2007, May). Guidance for industry: Clinical trial endpoints for the approval of cancer drugs and biologics. Retrieved January 24, 2018, from
  7. Dans, A. L., Connolly, S. J., Wallentin, L., Yang, S., Nakamya, J., Brueckmann, M., Ezekowitz, M., Oldgren, J., Eikelboom, J. W., Reilly, P. A., & Yusuf, S. (2013). Concomitant use of antiplatelet therapy with dabigatran or warfarin in the Randomized Evaluation of Long-Term Anticoagulation Therapy (RE-LY) trial. Circulation., 127(5), 634–640. Scholar
  8. Dmitrienko, A., Muysers, C., Fritsch, A., & Lipkovich, I. (2016). General guidance on exploratory and confirmatory subgroup analysis in late-stage clinical trials. J Biopharm Stat., 26(1), 71–98. Scholar
  9. Donohue, J. F. (2005). Minimal clinically important differences in COPD lung function. COPD., 2(1), 111–124.CrossRefGoogle Scholar
  10. EMA. (2013). Draft guideline on the investigation of subgroups in confirmatory clinical trials, EMA/CHMP/539146/2013. Draft for consultation.Google Scholar
  11. Forst, T., Uhlig-Laske, B., Ring, A., Graefe-Mody, U., Friedrich, C., Herbach, K., Woerle, H. J., & Dugi, K. A. (2010). Linagliptin (BI 1356), a potent and selective DPP-4 inhibitor, is safe and efficacious in combination with metformin in patients with inadequately controlled Type 2 diabetes. Diabet Med., 27(12), 1409–1419. Scholar
  12. Friedman, L. M., Furberg, C. D., & DeMets, D. (2010). Fundamentals of clinical trials. Springer.Google Scholar
  13. Grill, S. (2017). Assessing consistency of subgroup specific treatment effects in clinical trials with binary endpoints. MSc thesis, University of Bremen.Google Scholar
  14. Haidar, S. H., Davit, B., Chen, M. L., Conner, D., Lee, L., Li, Q. H., Lionberger, R., Makhlouf, F., Patel, D., Schuirmann, D. J., & Yu, L. X. (2008). Bioequivalence approaches for highly variable drugs and drug products. Pharm Res., 25(1), 237–241.CrossRefGoogle Scholar
  15. Hemmings, R. (2014). An overview of statistical and regulatory issues in the planning, analysis, and interpretation of subgroup analyses in confirmatory clinical trials. J Biopharm Stat., 24(1), 4–18. Scholar
  16. Henderson, N. C., Louis, T. A., Wang, C., & Varadhan, R. (2016). Bayesian analysis of heterogeneous treatment effects for patient-centered outcomes research. Health Serv Outcomes Res Method, 16, 213–233. Scholar
  17. Hosmer Jr., D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). Hoboken, NJ: Wiley.CrossRefGoogle Scholar
  18. ICH E10. (2000). Choice of Control Group and Related Issues in Clinical Trials.Google Scholar
  19. Ioannidis, J. P., Hozo, I., & Djulbegovic, B. (2013). Optimal type I and type II error pairs when the available sample size is fixed. J Clin Epidemiol., 66(8), 903–910.e2. Scholar
  20. Kent, D. M., Rothwell, P. M., Ioannidis, J. P., Altman, D. G., & Hayward, R. A. (2010). Assessing and reporting heterogeneity in treatment effects in clinical trials: a proposal. Trials., 11, 85. Scholar
  21. Khozin, S., Blumenthal, G. B., Jiang, X., et al. (2014). U.S. Food and Drug Administration approval summary: Erlotinib for the first-line treatment of metastatic non-small cell lung cancer with epidermal growth factor receptor exon 19 deletions or exon 21 (L858R) substitution mutations. The Oncologist, 19, 774–779.CrossRefGoogle Scholar
  22. Koehler, E., Brown, E., & Haneuse, S. J. P. A. (2009). On the assessment of Monte Carlo error in simulation-based statistical analyses. Am Stat., 63(2), 155–162. Scholar
  23. Machin, D., & Campbell, M. J. (2005). Design of studies for medical research. Chichester: Wiley.CrossRefGoogle Scholar
  24. Mok, T. S., Wu, Y. L., Thongprasert, S., Yang, C. H., Chu, D. T., Saijo, N., Sunpaweravong, P., Han, B., Margono, B., Ichinose, Y., Nishiwaki, Y., Ohe, Y., Yang, J. J., Chewaskulyong, B., Jiang, H., Duffield, E. L., Watkins, C. L., Armour, A. A., & Fukuoka, M. (2009). Gefitinib or carboplatin-paclitaxel in pulmonary adenocarcinoma. N Engl J Med., 361(10), 947–957. Scholar
  25. Motzer, R. J., Hutson, T. E., Tomczak, P., et al. (2007). Sunitinib versus interferon alfa in metastatic renal-cell carcinoma. N Engl J Med, 356, 115–124.CrossRefGoogle Scholar
  26. Natale, R. B., Thongprasert, S., Greco, A., et al. (2011). Phase III trial of Vandetenib compared with Erlotinib in patients with previously treated advanced non-small-cell lung cancer. Journal of Clinical Oncology, 29(8), 1059–1066.CrossRefGoogle Scholar
  27. Ocaña, J., Sánchez, M. P., Sánchez, A., & Carrasco, J. L. (2008). On equivalence and bioequivalence testing. Statistics & Operations Research Transactions, 32(2), 151–176. Retrieved from Scholar
  28. Ocaña, J., Sanchez, M. P., & Carrasco, J. L. (2015). Carryover negligibility and relevance in bioequivalence studies. Pharm Stat., 14(5), 400–408. Scholar
  29. Plavnik, F. L., & Ribeiro, A. B. (2002). A multicenter, open-label study of the efficacy and safety of telmisartan in mild to moderate hypertensive patients. Arq Bras Cardiol., 79(4), 339–350.CrossRefGoogle Scholar
  30. R Development Core Team. (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Retrieved from
  31. Ring, A., Day, S., & Schall, R. (2018). Assessment of consistency of treatment effects in subgroup analyses. Submitted.Google Scholar
  32. Russell, L. (2015). Lsmeans: Least-Squares Means. R package version 2.20-2. Retrieved from
  33. Schall, R. (1995). Assessment of individual and population bioequivalence using the probability that bioavailabilities are similar. Biometrics, 51(2), 615–626.CrossRefGoogle Scholar
  34. Schuirmann, D. J. (1987). A comparison of the two one-sided test procedure and the power approach for assessing the equivalence of average bioavailability. J Pharmacokin. Biopharm., 15, 657–680.CrossRefGoogle Scholar
  35. Tanislav, C., Milde, S., Schwartzkopff, S., Misselwitz, B., Sieweke, N., & Kaps, M. (2015). Baseline characteristics in stroke patients with atrial fibrillation: Clinical trials versus clinical practice. BMC Res Notes., 8, 262. Scholar
  36. Tanniou, J., van der Tweel, I., Teerenstra, S., & Roes, K. C. B. (2017). Estimates of subgroup treatment effects in overall nonsignificant trials: To what extent should we believe in them? Pharm Stat., 16(4), 280–295. Scholar
  37. Ting, N. (2017). Statistical interactions in a clinical trial. Ther Innov Regulat Sci, 52(1), 14–21.CrossRefGoogle Scholar
  38. Varadhan, R., & Seeger, J. D. (2013, January). Estimation and reporting of heterogeneity of treatment effects. In P. Velentgas, N. A. Dreyer, P. Nourjah, S. R. Smith, & M. M. Torchia (Eds.), Developing a protocol for observational comparative effectiveness research: A user’s guide. AHRQ Publication No. 12(13)-EHC099. Agency for Healthcare Research and Quality.Google Scholar
  39. Varadhan, R., Segala, J. B., Boyda, C. M., Wua, A. W., & Weiss, C. O. (2013). A framework for the analysis of heterogeneity of treatment effect in patient-centered outcomes research. J Clin Epidemiol., 66(8), 818–825. Scholar
  40. Venzon, D. J., & Moolgavkar, S. H. (1988). A method for computing profile-likelihood based confidence intervals. Applied Statistics, 37, 87–94.CrossRefGoogle Scholar
  41. Wallach, J. D., Sullivan, P. G., Trepanowski, J. F., Steyerberg, E. W., & Ioannidis, J. P. (2016). Sex based subgroup differences in randomized controlled trials: Empirical evidence from Cochrane meta-analyses. BMJ., 24(355), i5826. Scholar
  42. Wassmer, G., & Dragalin, V. (2015). Designing issues in confirmatory adaptive population enrichment trials. J Biopharm Stat., 25(4), 651–669. Scholar
  43. Zinman, B., Wanner, C., Lachin, J. M., Fitchett, D., Bluhmki, E., Hantel, S., Mattheus, M., Devins, T., Johansen, O. E., Woerle, H. J., Broedl, U. C., Inzucchi, S. E., & EMPA-REG OUTCOME Investigators. (2015). Empagliflozin, cardiovascular outcomes, and mortality in type 2 diabetes. N Engl J Med., 373(22), 2117–2128. Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • A. Ring
    • 1
    • 2
  • M. Scharpenberg
    • 3
  • S. Grill
    • 3
    • 4
  • R. Schall
    • 1
    • 5
  • W. Brannath
    • 3
  1. 1.Department of Mathematical Statistics and Actuarial ScienceUniversity of the Free StateBloemfonteinSouth Africa
  2. 2.medac GmbHWedelGermany
  3. 3.Faculty of Mathematics/Computer SciencesCompetence Center for Clinical Trials Bremen, University of BremenBremenGermany
  4. 4.Leibniz Institute for Prevention Research and Epidemiology – BIPSBremenGermany
  5. 5.IQVIA BiostatisticsBloemfonteinSouth Africa

Personalised recommendations