Confident Statistical Inference with Multiple Outcomes, Subgroups, and Other Issues of Multiplicity

  • Siyoen Kil
  • Eloise Kaizar
  • Szu-Yu Tang
  • Jason C. HsuEmail author
Living reference work entry


This chapter starts with a thorough discussion of different multiple comparison error rates, including weak and strong control for multiple tests and noncoverage probability for confidence sets. With multiple endpoints as an example, it describes which error rate controls would translate to incorrect decision rate controls. Then, using targeted therapy as the context, this chapter discusses a potential issue with some efficacy measures in terms of respecting logical relationships among the subgroups. A statistical principle that helps avoid this issue is described. As another example of multiplicity-induced issues to be aware of, it is shown that permutation test for patient targeting may not control Type I error rate in some situations. Finally, a list of the key points and a summary of the conclusions are given.


Subgroups Multiple comparisons Prognostic effect Permutation tests 


  1. Buyse M, Péron J (2020) Generalized pairwise comparisons for prioritized outcomes. In: Principles and practice of clinical trials. Springer Nature, ChamGoogle Scholar
  2. Cox DR, Oakes D (1984) Analysis of Survival Data. Chapman and HallGoogle Scholar
  3. Ding Y, Lin H-M, Hsu JC (2016) Subgroup mixable inference on treatment efficacy in mixture populations, with an application to time-to-event outcomes. Stat Med 35:1580–1594MathSciNetCrossRefGoogle Scholar
  4. Finner H, Strassburger K (2002) The partitioning principle: a powerful tool in multiple decision theory. Ann Stat 30:1194–1213MathSciNetCrossRefGoogle Scholar
  5. Finner H, Strassburger K (2007) Step-up related simultaneous confidence intervals for MCC and MCB. Biom J 49(1):40–51MathSciNetCrossRefGoogle Scholar
  6. Greenland S, Robins JM, Pearl J (1999) Confounding and collapsibility in causal inference. Stat Sci 14(1):29–46CrossRefGoogle Scholar
  7. Hayter AJ, Hsu JC (1994) On the relationship between stepwise decision procedures and confidence sets. J Am Stat Assoc 89:128–136CrossRefGoogle Scholar
  8. Hsu JC (1996) Multiple comparisons: theory and methods. Chapman & Hall, LondonCrossRefGoogle Scholar
  9. Hsu JC, Berger RL (1999) Stepwise confidence intervals without multiplicity adjustment for dose response and toxicity studies. J Am Stat Assoc 94:468–482Google Scholar
  10. Huang Y, Hsu JC (2007) Hochberg’s step-up method: cutting corners off Holm’s step-down method. Biometrika 22:2244–2248MathSciNetzbMATHGoogle Scholar
  11. Jiang W, Freidlin B, Simon R (2007) Biomarker-adaptive threshold design: a procedure for evaluating treatment with possible biomarker-defined subset effect. J Natl Cancer Inst 99:1036–1043CrossRefGoogle Scholar
  12. Kaizar EE, Li Y, Hsu JC (2011) Permutation multiple tests of binary features do not uniformly control error rates. J Am Stat Assoc 106:1067–1074MathSciNetCrossRefGoogle Scholar
  13. Lin H-M, Xu H, Ding Y, Hsu JC (2019) Correct and logical inference on efficacy in subgroups and their mixture for binary outcomes. Biom J 61:8–26MathSciNetCrossRefGoogle Scholar
  14. Lipkovich I, Dmitrienko A, Denne J, Enas G (2011) Subgroup identification based on differential effect search – a recursive partitioning method for establishing response to treatment in patient subpopulations. Stat Med 30:2601–2621MathSciNetGoogle Scholar
  15. Martinussen T, Vansteelandt S, Andersen PK (2018) Subtleties in the interpretation of hazard ratios. arXiv:1810.09192v1Google Scholar
  16. Miller R, Siegmund D (1982) Maximally selected Chi-square statistics. Biometrics 38:1011–1016MathSciNetCrossRefGoogle Scholar
  17. Stefansson G, Kim W, Hsu JC (1988) On confidence sets in multiple comparisons. In: Gupta SS, Berger JO (eds) Statistical decision theory and related topics IV, vol 2. Springer, New York, pp 89–104CrossRefGoogle Scholar
  18. Strassburger K, Bretz F (2008) Compatible simultaneous lower confidence bounds for the holm procedure and other Bonferroni-based closed tests. Stat Med 27(24):4914–4927MathSciNetCrossRefGoogle Scholar
  19. Takeuchi K (1973) Studies in some aspects of theoretical foundations of statistical data analysis (in Japanese). Toyo Keizai Shinposha, TokyoGoogle Scholar
  20. Takeuchi K (2010) Basic ideas and concepts for multiple comparison procedures. Biom J 52:722–734MathSciNetCrossRefGoogle Scholar
  21. Tukey JW (1953) The problem of multiple comparisons. Dittoed manuscript of 396 pages, Department of Statistics, Princeton UniversityGoogle Scholar
  22. Tukey JW (1994) The problem of multiple comparisons, Chapter 1. In: Braun HI (ed) The collected works of John W. Tukey, vol VIII. Chapman & Hall, New York/London, pp 1–300Google Scholar
  23. Woodcock J (2015) FDA Voice. Posted 23 Mar 2015Google Scholar
  24. Xu H, Hsu JC (2007) Using the partitioning principle to control the generalized family error rate. Biom J 49:52–67MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Siyoen Kil
    • 1
  • Eloise Kaizar
    • 2
  • Szu-Yu Tang
    • 3
  • Jason C. Hsu
    • 4
    Email author
  1. 1.LSK Global Pharmaceutical ServicesSeoulRepublic of Korea
  2. 2.The Ohio State UniversityColumbusUSA
  3. 3.Roche Tissue DiagnosticsOro ValleyUSA
  4. 4.Department of StatisticsThe Ohio State UniversityColumbusUSA

Section editors and affiliations

  • Stephen George
    • 1
  1. 1.Dept. of Biostatistics and Bioinformatics,Basic Science DivisonDuke University, School of MedicineDurhamUSA

Personalised recommendations