Abstract
Group sequential designs with multiple endpoints are widely used in clinical trials and medical research. In the literature of group sequential designs with a single primary hypothesis, it was recommended to calculate the critical boundary for a symmetric non-directional two-sided test at the significance level \(\alpha \) by constructing a one-sided critical boundary at the significance level \(\alpha /2\). As a natural but unvalidated extension, researchers may think the same conclusion holds for testing multiple endpoints in group sequential designs, that is, the critical boundary calculated to control the type I error at level \(\alpha /2\) for one-sided test also controls the type I error at level \(\alpha \) for non-directional two-sided tests. In this paper, we demonstrate that using one-sided \(\alpha /2\)-level critical boundaries for non-directional two-sided \(\alpha \)-level tests leads to notable and unnecessary conservativeness when multiple endpoints are hierarchically tested in group sequential designs. To the best of our knowledge, we are the first to reveal the relationship between the type I error rates of the two-sided and one-sided tests in hierarchical testing of multiple endpoints in group sequential designs. We consider three commonly used hierarchical structures for the primary and secondary endpoints, namely the stagewise hierarchical, overall hierarchical, and partially hierarchical structures. Under all three hierarchical structures, we refine the critical boundary for the secondary hypothesis. In addition, we prove that our boundary refinement methods yield enhanced statistical power. The proposed methodology is illustrated using a randomized clinical trial examining the effect of energy-reduced diets with different macronutrient compositions on body weight.
Similar content being viewed by others
References
Armitage P (1971) Statistical methods in medical research. Blackwell Scientific Publications, Oxford
Armitage P, McPherson CK, Rowe BC (1969) Repeated significance tests on accumulating data. J R Stat Soc A (General) 132(2):235–244. https://doi.org/10.2307/2343787
Bacanli S, Demirhan YP (2008) A group sequential test for the inverse Gaussian mean. Stat Papers 49(2):377–386. https://doi.org/10.1007/s00362-006-0020-9.
Bartroff J, Song J (2016) A rejection principle for sequential tests of multiple hypotheses controlling familywise error rates. Scand J Stat 43(1):3–19. https://doi.org/10.1111/sjos.12161
Berger RL, Boos DD (1994) P-values maximized over a confidence set for the nuisance parameter. J Am Stat Assoc 89(427):1012–1016. https://doi.org/10.2307/2290928
Bernhard G, Klein M, Hommel G (2004) Global and multiple test procedures using ordered p-values—a review. Stat Papers 45(1):1–14. https://doi.org/10.1007/BF02778266
Chambaz A, van der Laan MJ (2014) Inference in targeted group-sequential covariate-adjusted randomized clinical trials. Scand J Stat 41(1):104–140. https://doi.org/10.1111/sjos.12013
DeMets DL, Lan KKG (1994) Interim analysis: the alpha spending function approach. Stat Med 13(13–14):1341–1352. https://doi.org/10.1002/sim.4780131308
Ferguson TS (1995) A class of symmetric bivariate uniform distributions. Stat Papers 36(1):31. https://doi.org/10.1007/BF02926016
Finner H, Roters M, Strassburger K (2017) On the simes test under dependence. Stat Papers 58(3):775–789. https://doi.org/10.1007/s00362-015-0725-8
Fisher RA (1934) Statistical methods for research workers, 5th edn. Oliver & Boyd, Edinburgh
Freedman MR, King J, Kennedy E (2001) Executive summary. Obes Res 9(S3):1–5. https://doi.org/10.1038/oby.2001.113
Friedman LM, Furberg CD, DeMets D, Reboussin DM, Granger CB (2015) Fundamentals of clinical trials, 5th edn. Springer, Geneva
Gardner C, Kiazand A, Alhassan S (2007) Comparison of the atkins, zone, ornish, and LEARN diets for change in weight and related risk factors among overweight premenopausal women: the A TO Z weight loss study: a randomized trial. J Am Med Assoc 297(9):969–977. https://doi.org/10.1001/jama.297.9.969
Geller NL (1994) Discussion of ‘interim analysis: the alpha spending approach’. Stat Med 13(13–14):1353–1356. https://doi.org/10.1002/sim.4780131309
Genz A, Bretz F (2009) Computation of multivariate normal and \(t\) probabilities. Lecture Notes in Statistics, vol 195. Springer, Heidelberg
Glimm E, Maurer W, Bretz F (2010) Hierarchical testing of multiple endpoints in group-sequential trials. Stat Med 29(2):219–228. https://doi.org/10.1002/sim.3748
Gou J, Chén OY (2019) Critical boundary refinement in a group sequential trial when the primary endpoint data accumulate faster than the secondary endpoint. In: Zhang L, Chen D-G, Jiang H, Li G, Quan H (eds) Contemporary biostatistics with biopharmaceutical applications. Springer, Switzerland
Gou J, Tamhane AC (2018a) A flexible choice of critical constants for the improved hybrid Hochberg–Hommel procedure. Sankhya B 80(1):85–97. https://doi.org/10.1007/s13571-017-0135-0
Gou J, Tamhane AC (2018b) Hochberg procedure under negative dependence. Stat Sinica 28(1):339–362. https://doi.org/10.5705/ss.202016.0306
Gou J, Xi D (2019) Hierarchical testing of a primary and a secondary endpoint in a group sequential design with different information times. Stat Biopharm Res. https://doi.org/10.1080/19466315.2018.1546613
Gou J, Tamhane AC, Xi D, Rom D (2014) A class of improved hybrid Hochberg–Hommel type step-up multiple test procedures. Biometrika 101(4):899–911. https://doi.org/10.1093/biomet/asu032
Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, Altman DG (2016) Statistical tests, \(p\) values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol 31(4):337–350. https://doi.org/10.1007/s10654-016-0149-3
Hochberg Y, Tamhane AC (1987) Multiple comparison procedures. Wiley, New York
Hochberg Y (1988) A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75(4):800–802. https://doi.org/10.2307/2336325
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70
Hung HMJ, Wang S-J, O’Neill R (2007) Statistical considerations for testing multiple endpoints in group sequential or adaptive clinical trials. J Biopharm Stat 17(6):1201–1210. https://doi.org/10.1080/10543400701645405
Hussein A, Muttlak HA, Al-Sawi E (2013) Group sequential methods based on ranked set samples. Stat Papers 54(3):547–562. https://doi.org/10.1007/s00362-012-0448-z
Jennison C, Turnbull BW (2000) Group aequential methods with applications to clinical trials. Chapman and Hall/CRC, New York
Jéquier E, Bray GA (2002) Low-fat diets are preferred. Am J Med 113(9):41–46. https://doi.org/10.1016/S0002-9343(01)00991-3
Kim K, DeMets DL (1987) Design and analysis of group sequential tests based on the type I error spending rate function. Biometrika 74(1):149–154. https://doi.org/10.1093/biomet/74.1.149
Lan KKG, DeMets DL (1983) Discrete sequential boundaries for clinical trials. Biometrika 70(3):659–663. https://doi.org/10.1093/biomet/70.3.659
Lan KKG, DeMets DL (1989) Group sequential procedures: calendar versus information time. Stat Med 8(10):1191–1198. https://doi.org/10.1002/sim.4780081003
LeCheminant JD, Gibson CA, Sullivan DK, Hall S, Washburn R, Vernon MC, Curry C, Stewart E, Westman EC, Donnelly JE (2007) Comparison of a low carbohydrate and low fat diet for weight maintenance in overweight or obese adults enrolled in a clinical weight management program. Nutr J 6(1):36. https://doi.org/10.1186/1475-2891-6-36
Marcus R, Peritz E, Gabriel KR (1976) On closed testing procedures with special reference to ordered analysis of variance. Biometrika 63:655–660. https://doi.org/10.2307/2335748
Maurer W, Bretz F (2013) Multiple testing in group sequential trials using graphical approaches. Stat Biopharm Res 5(4):311–320. https://doi.org/10.1080/19466315.2013.807748
McManus K, Antinoro L, Sacks F (2001) A randomized controlled trial of a moderate-fat, low-energy diet compared with a low fat, low-energy diet for weight loss in overweight adults. Int J Obes 25:1503–1511. https://doi.org/10.1038/sj.ijo.0801796
Norleans MX (2001) Statistical methods for clinical trials. Marcel Dekker, New York
O’Brien PC, Fleming TR (1979) A multiple testing procedure for clinical trials. Biometrics 35(3):549–556. https://doi.org/10.2307/2530245
Pigeot I (2000) Basic concepts of multiple tests—a survey. Stat Papers 41(1):3–36. https://doi.org/10.1007/BF02925674
Pocock SJ (1977) Group sequential methods in the design and analysis of clinical trials. Biometrika 64(2):191–199. https://doi.org/10.1093/biomet/64.2.191
Proschan MA, Lan KKG, Wittes JT (2006) Statistical monitoring of clinical trials: a unified approach. Springer, New York
Sacks FM, Bray GA, Carey VJ, Smith SR, Ryan DH, Anton SD, McManus K, Champagne CM, Bishop LM, Laranjo N, Leboff MS, Rood JC, de Jonge L, Greenway FL, Loria CM, Obarzanek E, Williamson DA (2009) Comparison of weight-loss diets with different compositions of fat, protein, and carbohydrates. N Engl J Med 360(9):859–873. https://doi.org/10.1056/NEJMoa0804748
Senn S, Bretz F (2007) Power and sample size when multiple endpoints are considered. Pharm Stat 6(3):161–170. https://doi.org/10.1002/pst.301
Shai I, Schwarzfuchs D, Henkin Y, Shahar DR, Witkow S, Greenberg I (2008) Weight loss with a low-carbohydrate, mediterranean, or low-fat diet. N Engl J Med 359(3):229–241. https://doi.org/10.1056/NEJMoa0708681
Slepian D (1962) The one-sided barrier problem for Gaussian noise. Bell Syst Tech J 41:463–501. https://doi.org/10.1002/j.1538-7305.1962.tb02419.x
Tamhane AC, Gou J (2018) Advances in p-value based multiple test procedures. J Biopharm Stat 28(1):10–27. https://doi.org/10.1080/10543406.2017.1378666
Tamhane AC, Mehta CR, Liu L (2010) Testing a primary and a secondary endpoint in a group sequential design. Biometrics 66(4):1174–1184. https://doi.org/10.1111/j.1541-0420.2010.01402.x
Tamhane AC, Wu Y, Mehta CR (2012a) Adaptive extensions of a two-stage group sequential procedure for testing primary and secondary endpoints (I): unknown correlation between the endpoints. Stat Med 31(19):2027–2040. https://doi.org/10.1002/sim.5372
Tamhane AC, Wu Y, Mehta CR (2012b) Adaptive extensions of a two-stage group sequential procedure for testing primary and secondary endpoints (II): sample size re-estimation. Stat Med 31(19):2041–2054. https://doi.org/10.1002/sim.5377
Tamhane AC, Gou J, Jennison C, Mehta CR, Curto T (2018) A gatekeeping procedure to test a primary and a secondary endpoint in a group sequential design with multiple interim looks. Biometrics 74(1):40–48. https://doi.org/10.1111/biom.12732
Tang D-I, Geller NL (1999) Closed testing procedures for group sequential clinical trials with multiple endpoints. Biometrics 55(4):1188–1192. https://doi.org/10.1111/j.0006-341X.1999.01188.x
Toubro S, Astrup A (1997) Randomised comparison of diets for maintainingobese subjects’ weight after major weight loss: ad lib, low fat, high carbohydrate dietv fixed energy intake. BMJ 314(7073):29. https://doi.org/10.1136/bmj.314.7073.29
Wang SK, Tsiatis AA (1987) Approximately optimal one-parameter boundaries for group sequential trials. Biometrics 43(1):193–199. https://doi.org/10.2307/2531959
Wassmer G (2000) Basic concepts of group sequential and adaptive group sequential test procedures. Stat Papers 41(3):253–279. https://doi.org/10.1007/BF02925923
Willett WC, Leibel RL (2002) Dietary fat is not a major determinant of body fat. Am J Med 113(9):47–59. https://doi.org/10.1016/S0002-9343(01)00992-5
Ye Y, Li A, Liu L, Yao B (2013) A group sequential holm procedure with multiple primary endpoints. Stat Med 32(7):1112–1124. https://doi.org/10.1002/sim.5700
Zhang F, Gou J (2016) A \(p\)-value model for theoretical power analysis and its applications in multiple testing procedures. BMC Med Res Methodol 16(1):135. https://doi.org/10.1186/s12874-016-0233-0
Acknowledgements
We thank Ajit C. Tamhane and Cyrus R. Mehta for bringing this research question to our attention. We also thank Olympia Hadjiliadis, Dana Sylvan, Dong Xi and Cheng Li for comments that greatly improved the manuscript, and thank Vinicio Haro, So-Young Lim and Xiaojia Lu for helpful discussions. We thank Editor-in-Chief Christine H. Müller and an anonymous referee for their comments which helped to greatly improve the paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zhang, F., Gou, J. Refined critical boundary with enhanced statistical power for non-directional two-sided tests in group sequential designs with multiple endpoints. Stat Papers 62, 1265–1290 (2021). https://doi.org/10.1007/s00362-019-01134-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-019-01134-7