Abstract
We consider a two-group randomized clinical trial of prioritized endpoints, where mortality affects the assessment of a follow-up continuous outcome. With the continuous outcome as the principal outcome, we combine it with mortality via the worst-rank paradigm into a single composite endpoint. Then, we develop a weighted Wilcoxon–Mann–Whitney test statistic to analyze the data. We determine the optimal weights for the Wilcoxon–Mann–Whitney test statistic that maximize its power. We provide the rationale for the weights and their implications in the application of the method. In addition, we derive a formula for its power and demonstrate its accuracy in simulations. Finally, we apply the method to data from an acute ischemic stroke clinical trial of normobaric oxygen therapy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adams H., Jr., Davis, P., Leira, E., Chang, K., Bendixen, B., Clarke, W., et al. (1999). Baseline NIH stroke scale score strongly predicts outcome after stroke: A report of the Trial of Org 10172 in Acute Stroke Treatment (TOAST). Neurology, 53(1), 126.
Ahmad, Y., Nijjer, S., Cook, C. M., El-Harasis, M., Graby, J., Petraco, R., et al. (2015). A new method of applying randomised control study data to the individual patient: A novel quantitative patient-centred approach to interpreting composite end points. International Journal of Cardiology, 195, 216–224.
Allen, L. A., Hernandez, A. F., O’Connor, C. M., & Felker, G. M. (2009). End points for clinical trials in acute heart failure syndromes. Journal of the American College of Cardiology, 53(24), 2248–2258.
Anker, S. D., & Mcmurray, J. J. (2012). Time to move on from’time-to-first’: Should all events be included in the analysis of clinical trials? European Heart Journal, 33(22), 2764–2765.
Anker, S. D., Schroeder, S., Atar, D., Bax, J. J., Ceconi, C., Cowie, M. R., et al. (2016). Traditional and new composite endpoints in heart failure clinical trials: Facilitating comprehensive efficacy assessments and improving trial efficiency. European Journal of Heart Failure, 18(5):482–489.
Anstrom, K. J., & Eisenstein, E. L. From batting average to wins above replacement to composite end points-refining clinical research using baseball statistical methods. American Heart Journal, 161(5), 805–806.
Armstrong, P. W., & Westerhout, C. M. (2013). The power of more than one. Circulation 127, 665–667.
Armstrong, P. W., & Westerhout, C. M. (2017). Composite end points in clinical research. Circulation, 135(23), 2299–2307.
Armstrong, P. W., Westerhout, C. M., Van de Werf, F., Califf, R. M., Welsh, R. C., Wilcox, R. G., et al. (2011). Refining clinical trial composite outcomes: An application to the assessment of the safety and efficacy of a new thrombolytic–3 (assent-3) trial. American Heart Journal, 161(5), 848–854.
Bakal, J. A., Roe, M. T., Ohman, E. M., Goodman, S. G., Fox, K. A., Zheng, Y., et al. (2015). Applying novel methods to assess clinical outcomes: Insights from the trilogy ACS trial. European Heart Journal, 36(6), 385–392.
Bakal, J. A., Westerhout, C. M., & Armstrong, P. W. (2012). Impact of weighted composite compared to traditional composite endpoints for the design of randomized controlled trials. Statistical Methods in Medical Research, 24(6), 980–988. https://doi.org/10.1177/0962280211436004
Bakal, J. A., Westerhout, C. M., Cantor, W. J., Fernández-Avilés, F., Welsh, R. C., Fitchett, D., et al. (2012). Evaluation of early percutaneous coronary intervention vs. standard therapy after fibrinolysis for st-segment elevation myocardial infarction: Contribution of weighting the composite endpoint. European Heart Journal, 34(12), 903–908.
Bebu, I., & Lachin, J. M. (2015). Large sample inference for a win ratio analysis of a composite outcome based on prioritized components. Biostatistics, 17(1), 178–187.
Berry, J. D., Miller, R., Moore, D. H., Cudkowicz, M. E., Van Den Berg, L. H., Kerr, D. A., et al. (2013). The combined assessment of function and survival (CAFS): A new endpoint for ALS clinical trials. Amyotrophic Lateral Sclerosis and Frontotemporal Degeneration, 14(3), 162–168.
Bonate, P. L. (2000). Analysis of pretest-posttest designs. Boca Raton: CRC Press.
Braunwald, E., Antman, E. M., Beasley, J. W., Califf, R. M., Cheitlin, M. D., Hochman, J. S., et al. (2002). ACC/AHA 2002 guideline update for the management of patients with unstable angina and non–st-segment elevation myocardial infarction–summary article: A report of the American college of cardiology/American heart association task force on practice guidelines (committee on the management of patients with unstable angina). Journal of the American College of Cardiology, 40(7), 1366–1374.
Brittain, E., Palensky, J., Blood, J., & Wittes, J. (1997). Blinded subjective rankings as a method of assessing treatment effect: A large sample example from the systolic hypertension in the elderly program (SHEP). Statistics in Medicine, 16(6), 681–693.
Brown, P. M., Anstrom, K. J., Felker, G. M., & Ezekowitz, J. A. (2016). Composite end points in acute heart failure research: Data simulations illustrate the limitations. Canadian Journal of Cardiology, 32(11), 1356.e21–1356.e28.
Brunner, E., & Munzel, U. (2000). The nonparametric Behrens-Fisher problem: Asymptotic theory and a small-sample approximation. Biometrical Journal, 42(1), 17–25.
Bruno, A., Saha, C., & Williams, L.S. (2006). Using change in the national institutes of health stroke scale to measure treatment effect in acute stroke trials. Stroke, 37(3), 920–921.
Buyse, M. (2010). Generalized pairwise comparisons of prioritized outcomes in the two-sample problem. Statistics in Medicine, 29(30), 3245–3257
Campbell, D. T., & Kenny, D. A. (1999). A primer on regression artifacts. New York: Guilford Publications.
Chung, E., & Romano, J. P. (2016). Asymptotically valid and exact permutation tests based on two-sample U-statistics. Journal of Statistical Planning and Inference, 168, 97–105.
Claggett, B., Wei, L.-J., & Pfeffer, M. A. (2013). Moving beyond our comfort zone. European Heart Journal, 34(12), 869–871.
Cordoba, G., Schwartz, L., Woloshin, S., Bae, H., & Gotzsche, P. (2010). Definition, reporting, and interpretation of composite outcomes in clinical trials: Systematic review. British Medical Journal, 341, c3920.
Davis, S. M., Koch, G. G., Davis, C., & LaVange, L. M. (2003). Statistical approaches to effectiveness measurement and outcome-driven re-randomizations in the clinical antipsychotic trials of intervention effectiveness (CATIE) studies. Schizophrenia Bulletin, 29(1), 73.
DeCoster, T., Willis, M., Marsh, J., Williams, T., Nepola, J., Dirschl, D., & Hurwitz, S. (1999). Rank order analysis of tibial plafond fractures: Does injury or reduction predict outcome? Foot & Ankle International, 20(1), 44–49.
Dmitrienko, A., D’Agostino, R. B., & Huque, M. F. (2013). Key multiplicity issues in clinical drug development. Statistics in Medicine, 32(7), 1079–1111.
Fay, M. P., & Proschan, M. A. (2010). Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. Statistics Surveys, 4, 1–39.
Feldman, A., Baughman, K., Lee, W., Gottlieb, S., Weiss, J., Becker, L., & Strobeck, J. (1991). Usefulness of OPC-8212, a quinolinone derivative, for chronic congestive heart failure in patients with ischemic heart disease or idiopathic dilated cardiomyopathy. The American Journal of Cardiology, 68(11), 1203–1210.
Felker, G., Anstrom, K., & Rogers, J. (2008). A global ranking approach to end points in trials of mechanical circulatory support devices. Journal of Cardiac Failure, 14(5), 368–372.
Felker, G. M., & Maisel, A. S. (2010). A global rank end point for clinical trials in acute heart failure. Circulation: Heart Failure, 3(5), 643–646.
Ferreira-Gonzalez, I., Permanyer-Miralda, G., Busse, J., Devereaux, P., Guyatt, G., Alonso-Coello, P., et al. (2009). Composite outcomes can distort the nature and magnitude of treatment benefits in clinical trials. Annals of Internal Medicine, 150(8), 566.
Ferreira-González, I., Permanyer-Miralda, G., Busse, J. W., Bryant, D. M., Montori, V. M., Alonso-Coello, P., et al. (2007a). Methodologic discussions for using and interpreting composite endpoints are limited, but still identify major concerns. Journal of Clinical Epidemiology, 60(7), 651–657.
Ferreira-González, I., Permanyer-Miralda, G., Domingo-Salvany, A., Busse, J., Heels-Ansdell, D., Montori, V., et al. (2007b). Problems with use of composite end points in cardiovascular trials: Systematic review of randomised controlled trials. The BMJ, 334(7597), 786.
Finkelstein, D., & Schoenfeld, D. (1999). Combining mortality and longitudinal measures in clinical trials. Statistics in Medicine, 18(11), 1341–1354.
Fisher, L. D. (1998). Self-designing clinical trials. Statistics in Medicine, 17(14), 1551–1562.
Fitzmaurice, G. (2001). A conundrum in the analysis of change. Nutrition, 17(4), 360–361.
Follmann, D., Duerr, A., Tabet, S., Gilbert, P., Moodie, Z., Fast, P., et al. (2007). Endpoints and regulatory issues in HIV vaccine clinical trials: Lessons from a workshop. Journal of Acquired Immune Deficiency Syndromes (1999), 44(1), 49.
Follmann, D., Wittes, J., & Cutler, J. A. (1992). The use of subjective rankings in clinical trials with an application to cardiovascular disease. Statistics in Medicine, 11(4), 427–437.
Freemantle, N., Calvert, M., Wood, J., Eastaugh, J., & Griffin, C. (2003). Composite outcomes in randomized trials: Greater precision but with greater uncertainty? JAMA, 289(19), 2554.
Gail, M. H., Mark, S. D., Carroll, R. J., Green, S. B., & Pee, D. (1996). On design considerations and randomization-based inference for community intervention trials. Statistics in Medicine, 15(11), 1069–1092.
Gehan, E. A. (1965). A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika, 52(1–2), 203–223.
Gómez, G., & Lagakos, S. W. (2013). Statistical considerations when using a composite endpoint for comparing treatment groups. Statistics in Medicine, 32(5), 719–738.
Gould, A. (1980). A new approach to the analysis of clinical drug trials with withdrawals. Biometrics, 36(4), 721–727.
Grech, E., & Ramsdale, D. (2003). Acute coronary syndrome: Unstable angina and non-st segment elevation myocardial infarction. The BMJ, 326(7401), 1259.
Hallstrom, A., Litwin, P., & Douglas Weaver, W. (1992). A method of assigning scores to the components of a composite outcome: An example from the MITI trial. Controlled Clinical Trials, 13(2), 148–155.
Hanley, J. A., & McNeil, B. J. (1992). The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 143(1), 29–36.
Hariharan, S., McBride, M. A., & Cohen, E. P. (2003). Evolution of endpoints for renal transplant outcome. American Journal of Transplantation, 3(8), 933–941.
Heddle, N. M., & Cook, R. J. (2011). Composite outcomes in clinical trials: What are they and when should they be used? Transfusion, 51(1), 11–13.
Huang, P., Woolson, R. F., & O’Brien, P. C. (2008). A rank-based sample size method for multiple outcomes in clinical trials. Statistics in Medicine, 27(16), 3084–3104.
Huque, M. F., Alosh, M., & Bhore, R. (2011). Addressing multiplicity issues of a composite endpoint and its components in clinical trials. Journal of Biopharmaceutical Statistics, 21(4), 610–634.
Kaufman, K. D., Olsen, E. A., Whiting, D., Savin, R., DeVillez, R., Bergfeld, W., et al. (1998). Finasteride in the treatment of men with androgenetic alopecia. Journal of the American Academy of Dermatology, 39(4), 578–589.
Kawaguchi, A., Koch, G. G., & Wang, X. (2011). Stratified multivariate Mann–Whitney estimators for the comparison of two treatments with randomization based covariance adjustment. Statistics in Biopharmaceutical Research, 3(2), 217–231.
Lachin, J. (1999). Worst-rank score analysis with informatively missing observations in clinical trials. Controlled Clinical Trials, 20(5), 408–422.
Lachin, J. M., & Bebu, I. (2015). Application of the Wei–Lachin multivariate one-directional test to multiple event-time outcomes. Clinical Trials, 12(6), 627–633. https://doi.org/10.1177/1740774515601027.
Li, D., Zhao, G., Paty, D., University of British Columbia MS/MRI Analysis Research Group, The SPECTRIMS Study Group. (2001). Randomized controlled trial of interferon-beta-1a in secondary progressive MS MRI results. Neurology, 56(11), 1505–1513.
Lisa, A. B., & James, S. H. (1997). Rule-based ranking schemes for antiretroviral trials. Statistics in Medicine, 16, 1175–1191.
Logan, B., & Tamhane, A. (2008). Superiority inferences on individual endpoints following noninferiority testing in clinical trials. Biometrical Journal, 50(5), 693–703.
Lubsen, J., Just, H., Hjalmarsson, A., La Framboise, D., Remme, W., Heinrich-Nols, J., et al. (1996). Effect of pimobendan on exercise capacity in patients with heart failure: Main results from the Pimobendan in Congestive Heart Failure (PICO) trial. Heart, 76(3), 223.
Lubsen, J., & Kirwan, B.-A. (2002). Combined endpoints: Can we use them? Statistics in Medicine, 21(19), 2959–2970.
Luo, X., Qiu, J., Bai, S., & Tian, H. (2017). Weighted win loss approach for analyzing prioritized outcomes. Statistics in Medicine, 36(15), 2452–2465.
Manja, V., AlBashir, S., & Guyatt, G. (2017). Criteria for use of composite end points for competing risks–a systematic survey of the literature with recommendations. Journal of Clinical Epidemiology, 82, 4–11.
Mascha, E. J., & Turan, A. (2012). Joint hypothesis testing and gatekeeping procedures for studies with multiple endpoints. Anesthesia & Analgesia, 114(6), 1304–1317.
Matsouaka, R. A., & Betensky, R. A. (2015). Power and sample size calculations for the Wilcoxon–Mann–Whitney test in the presence of death-censored observations. Statistics in Medicine, 34(3), 406–431.
Matsouaka, R. A., Singhal, A. B., & Betensky, R. A. (2016). An optimal Wilcoxon–Mann–Whitney test of mortality and a continuous outcome. Statistical Methods in Medical Research, 27(8), 2384–2400. https://doi.org/10.1177/0962280216680524
Minas, G., Rigat, F., Nichols, T. E., Aston, J. A., & Stallard, N. (2012). A hybrid procedure for detecting global treatment effects in multivariate clinical trials: Theory and applications to fMRI studies. Statistics in Medicine, 31(3), 253–268.
Moyé, L. (2013). Multiple analyses in clinical trials: Fundamentals for investigators. Berlin: Springer.
Moyé, L., Davis, B., & Hawkins, C. (1992). Analysis of a clinical trial involving a combined mortality and adherence dependent interval censored endpoint. Statistics in Medicine, 11(13), 1705–1717.
National Asthma Education and Prevention Program (National Heart, Lung, and Blood Institute). (2007). Third expert panel on the management of asthma. Expert panel report 3: Guidelines for the diagnosis and management of asthma. US Department of Health and Human Services, National Institutes of Health, National Heart, Lung, and Blood Institute.
Neaton, J., Gray, G., Zuckerman, B., & Konstam, M. (2005). Key issues in end point selection for heart failure trials: Composite end points. Journal of Cardiac Failure, 11(8), 567–575.
Neaton, J. D., Wentworth, D. N., Rhame, F., Hogan, C., Abrams, D. I., & Deyton, L. (1994). Considerations in choice of a clinical endpoint for aids clinical trials. Statistics in Medicine, 13(19–20), 2107–2125.
Newcombe, R. G. (2006). Confidence intervals for an effect size measure based on the Mann–Whitney statistic. part 2: Asymptotic methods and evaluation. Statistics in Medicine, 25(4), 559–573.
Oakes, J. M., & Feldman, H. A. (2001). Statistical power for nonequivalent pretest-posttest designs the impact of change-score versus ANCOVA models. Evaluation Review, 25(1), 3–28.
O’Brien, P. C. (1984). Procedures for comparing samples with multiple endpoints. Biometrics, 40, 1079–1087.
Parsons, M., Spratt, N., Bivard, A., Campbell, B., Chung, K., Miteff, F., et al. (2012). A randomized trial of tenecteplase versus alteplase for acute ischemic stroke. New England Journal of Medicine, 366(12), 1099–1107.
Pearl, J. (2014). Lord’s paradox revisited–(oh lord! kumbaya!). Tech. rep., Citeseer.
Pocock, S. J., Ariti, C. A., Collier, T. J., & Wang, D. (2011). The win ratio: A new approach to the analysis of composite endpoints in clinical trials based on clinical priorities. European Heart Journal, 33(2), 176–182.
Pratt, J. W. (1964). Robustness of some procedures for the two-sample location problem. Journal of the American Statistical Association, 59, 665–680.
Prieto-Merino, D., Smeeth, L., van Staa, T. P., & Roberts, I. (2013). Dangers of non-specific composite outcome measures in clinical trials. The BMJ, 347, f6782.
Ramchandani, R., Schoenfeld, D. A., & Finkelstein, D. M. (2016). Global rank tests for multiple, possibly censored, outcomes. Biometrics, 72, 926–935.
Röhmel, J., Gerlinger, C., Benda, N., & Läuter, J. (2006). On testing simultaneously non-inferiority in two multiple primary endpoints and superiority in at least one of them. Biometrical Journal, 48(6), 916–933.
Rosenbaum, P. R. (2006). Comment: The place of death in the quality of life. Statistical Science, 21(3), 313–316.
Rosner, B. (2015). Fundamentals of biostatistics. Toronto: Nelson Education.
Ross, S. (2007). Composite outcomes in randomized clinical trials: Arguments for and against. American Journal of Obstetrics and Gynecology, 196(2), 119–e1.
Rowan, J. A., Hague, W. M., Gao, W., Battin, M. R., & Moore, M. P. (2008). Metformin versus insulin for the treatment of gestational diabetes. New England Journal of Medicine, 358(19), 2003–2015.
Rubin, D. B. (2006). Rejoinder: Causal inference through potential outcomes and principal stratification: Application to studies with” censoring” due to death. Statistical Science, 21(3), 319–321.
Sampson, U. K., Metcalfe, C., Pfeffer, M. A., Solomon, S. D., & Zou, K. H. (2010). Composite outcomes: Weighting component events according to severity assisted interpretation but reduced statistical power. Journal of Clinical Epidemiology, 63(10), 1156–1158
Samson, K. (2013). News from the AAN annual meeting: Why a trial of normobaric oxygen in acute ischemic stroke was halted early. Neurology Today, 13(10), 34–35.
Sankoh, A. J., Li, H., & D’Agostino, R. B. (2014). Use of composite endpoints in clinical trials. Statistics in Medicine, 33(27), 4709–4714.
Senn, S. (2006). Change from baseline and analysis of covariance revisited. Statistics in Medicine, 25(24), 4334–4344.
Shahar, E., & Shahar, D. J. (2012). Causal diagrams and change variables. Journal of Evaluation in Clinical Practice, 18(1), 143–148.
Singhal, A., Benner, T., Roccatagliata, L., Koroshetz, W., Schaefer, P., Lo, E., et al. (2005). A pilot study of normobaric oxygen therapy in acute ischemic stroke. Stroke, 36(4), 797.
Singhal, A. B. (2006). Normobaric oxygen therapy in acute ischemic stroke trial. ClinicalTrials.gov Database. http://clinicaltrials.gov/ct2/show/NCT00414726
Singhal, A. B. (2007). A review of oxygen therapy in ischemic stroke. Neurological Research, 29(2), 173–183.
Spencer, S., Mayer, B., Bendall, K. L., & Bateman, E. D. (2007). Validation of a guideline-based composite outcome assessment tool for asthma control. Respiratory Research, 8(1), 26.
Subherwal, S., Anstrom, K. J., Jones, W. S., Felker, M. G., Misra, S., Conte, M. S., et al. (2012). Use of alternative methodologies for evaluation of composite end points in trials of therapies for critical limb ischemia. American Heart Journal, 164(3), 277.
Sun, H., Davison, B. A., Cotter, G., Pencina, M. J., & Koch, G. G. (2012). Evaluating treatment efficacy by multiple end points in phase ii acute heart failure clinical trials analyzing data using a global method. Circulation: Heart Failure, 5(6), 742–749.
Tomlinson, G., & Detsky, A. S. (2010). Composite end points in randomized trials: There is no free lunch. JAMA, 303(3), 267–268.
Tyler, K. M., Normand, S.-L. T., & Horton, N. J. (2011). The use and abuse of multiple outcomes in randomized controlled depression trials. Contemporary Clinical Trials, 32(2), 299–304.
van Breukelen, G. J. (2013). ANCOVA versus CHANGE from baseline in nonrandomized studies: The difference. Multivariate Behavioral Research, 48(6), 895–922.
Van Elteren, P. (1960). On the combination of independent two-sample tests of Wilcoxon. Bulletin of the International Statistical Institute, 37, 351–361.
Wen, X., Hartzema, A., Delaney, J. A., Brumback, B., Liu, X., Egerman, R., et al. (2017). Combining adverse pregnancy and perinatal outcomes for women exposed to antiepileptic drugs during pregnancy, using a latent trait model. BMC Pregnancy and Childbirth, 17(1), 10.
Willett, J. B. (1988). Questions and answers in the measurement of change. Review of Research in Education, 15, 345–422.
Wilson, R. F., & Berger, A. K. (2011). Are all end points created equal? The case for weighting. Journal of the American College of Cardiology, 57(5), 546–548.
Young, F. B., Weir, C. J., Lees, K. R., & GAIN International Trial Steering Committee and Investigators. (2005). Comparison of the national institutes of health stroke scale with disability outcome measures in acute stroke trials. Stroke, 36(10), 2187–2192.
Zhang, J., Quan, H., Ng, J., & Stepanavage, M. E. (1997). Some statistical methods for multiple endpoints in clinical trials. Controlled Clinical Trials, 18(3), 204–221.
Zhao, Y. (2006). Sample size estimation for the van Elteren test—a stratified Wilcoxon–Mann–Whitney test. Statistics in Medicine, 25(15), 2675–2687.
Acknowledgements
This work was supported by grants P50-NS051343, R01-CA075971, T32 NS048005, 1RO1HL118336-01, and UL1TR001117 awarded by the National Institutes of Health. The content of this paper is solely the responsibility of the authors and does not necessarily represent the official view of the National Institutes of Health.
Conflict of Interest: None declared.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix 1: Proof of Theorem 1.1
We consider \(\widetilde {X}_{ij}=\delta _{ij}(\eta +T_{ij})+(1-\delta _{ij})X_{ij},~ i=1, 2~\mbox{and} ~j=1,\ldots , N_i\)
For q i = 1 − p i, we have
where
Define \(U_{kl}=I(\widetilde {X}_{1k}<\widetilde {X}_{2l})+\frac {1}{2}I(\widetilde {X}_{1k}=\widetilde {X}_{2l}),\) for k = 1, …, N 1 and l = 1, …, N 2. The binary variable \(U_{kl}=I(\widetilde {X}_{1k}<\widetilde {X}_{2l})\) follows Bernoulli distribution with probability π U1. Its mean and variance, respectively, E(U kl) = π U1 and \(~~Var(U_{kl})=E(U_{kl})\left [1-E(U_{kl})\right ]=\pi _{U1}(1-\pi _{U1})\). Thus, we can use these results to derive the variance of U using the following formula:
Note that when k≠k′ and l≠l′, the covariance
When k≠k′ or l≠l′, we have
where \(\pi _{U2}=E(U_{kl} U_{k'l})\) and \(\pi _{U3}=E(U_{kl} U_{kl'}).\)
Therefore,
-
1.
No ties:
When there are no ties, \(I(\widetilde {X}_{1k}=\widetilde {X}_{2l})=0.\) In which case, \(U_{kl}=I(\widetilde {X}_{1k}<\widetilde {X}_{2l})= \delta _{1k}\delta _{2l}I({T}_{1k}<{T}_{2l}|T_{1k}\leq T_{max}, ~T_{2l}\leq T_{max})+\delta _{1k}(1-\delta _{2l}) + (1-\delta _{1k})(1-\delta _{2l})I({X}_{1k}<{X}_{2l})\), for k = 1, …, N 1 and l = 1, …, N 2, . We have
$$\displaystyle \begin{aligned} E(U_{kl}U_{k'l})&= P(T_{1k}<T_{2l}, T_{1k'}<T_{2l}|\delta_{1k}\delta_{1k'}\delta_{2l}=1)E(\delta_{1k}\delta_{1k'}\delta_{2l}=1)\\ &+ P(X_{1k'}<X_{2l})P(\delta_{1k}=1, ~\delta_{1k'}=\delta_{2l}=0)\\ &+P(X_{1k}<X_{2l})E(\delta_{1k}=\delta_{2l}=0)E(\delta_{1k'}=1)\\ &+ P(X_{1k}<X_{2l}, X_{1k'}<X_{2l})E(\delta_{1k}=\delta_{1k'}=\delta_{2l}=0)\\& +E(\delta_{1k}\delta_{1k'}=1)E(\delta_{2l}=0)\\ &= p_1^2p_2\pi_{t2}+2p_1q_1q_2\pi_{x1}+q_1^2q_2\pi_{x2}+p_1^2q_2\equiv \pi_{U2}\\ E(U_{kl}U_{kl'})&= P(T_{1k}<T_{2l},~ t_{1k}<t_{2l'}|\delta_{1k}\delta_{2l}\delta_{2l'}=1)E(\delta_{1k}\delta_{2l}\delta_{2l'}=1)\\ &+ P(T_{1k}<T_{2l}|\delta_{1k}\delta_{2l}=1, ~\delta_{2l'}=0)E(\delta_{1k}\delta_{2l}=1)E(\delta_{2l'}=0)\\ &+ P(t_{1k}<t_{2l'}|\delta_{1k}=1, ~\delta_{2l}=0, ~~\delta_{2l'}=1)E(\delta_{1k}\delta_{2l'}=1)E(\delta_{2l}=0)\\ &+ P(X_{1k}<X_{2l}, X_{1k}<X_{2l'})E(\delta_{1k}=\delta_{2l}=\delta_{2l'}=0)\\ &+ E(\delta_{1k}=1)E(\delta_{2l}=\delta_{2l'}=0)\\ &= p_1p_2^2\pi_{t3}+2p_1p_2q_2\pi_{t1}+q_1q_2^2\pi_{x3}+p_1q_2^2\equiv \pi_{U3}\\ \text{with} ~~~ \pi_{t2}&= P(T_{1k}<T_{2l}, T_{1k'}<T_{2l}|T_{1k}\leq T_{max},~ T_{1k'}\leq T_{max}, ~T_{2l}\leq T_{max}),\\ \pi_{x2}&= P(X_{1k}<X_{2l}, X_{1k'}<X_{2l}), \\~~ \pi_{t3} &= P(T_{1k}<T_{2l}, t_{1k}<t_{2l'}| T_{1k}\leq T_{max},~ T_{2l}\leq T_{max}, ~T_{2l'}\leq T_{max}), \\~~ \pi_{x3}&= P(X_{1k}<X_{2l}, X_{1k}<X_{2l'}) . \end{aligned} $$Under the null hypothesis of no difference between the two groups, with respect to survival and nonfatal outcome, we have F 1 = F 2 = F, G 1 = G 2 = G and p 1 = p 2 = p, q 1 = q 2 = q. This implies
$$\displaystyle \begin{aligned} \begin{array}{rcl} \pi_{t1}&\displaystyle =&\displaystyle P(T_{1k}<T_{2l}|T_{1k}\leq T_{max}, T_{2l}\leq T_{max})\\ &\displaystyle =&\displaystyle \frac{1}{2p^2}\left[F(T_{max})^2-F(0)^2\right]=\frac{1}{2}\\ \pi_{t2}&\displaystyle =&\displaystyle P(T_{1k}<T_{2l}, T_{1k'}<T_{2l}|T_{1k}\leq T_{max},T_{1k'}\leq T_{max},T_{2l}\leq T_{max})\\ &\displaystyle =&\displaystyle \frac{1}{p^3}\int_0^{T_{max}}F(t)^2dF(t)\\ &\displaystyle =&\displaystyle \frac{1}{3p^3}\left[F(T_{max})^3-F(0)^3\right]=\frac{1}{3}\\ \pi_{t3} &\displaystyle =&\displaystyle P(T_{1k}<T_{2l}, T_{1k}<T_{2l'}|T_{1k}\leq T, T_{2l}\leq T, T_{2l'}\leq T))\\ &\displaystyle =&\displaystyle \frac{1}{p^3}\int_0^{T_{max}}\left[1-F(t)\right]^2dF(t)\\ &\displaystyle =&\displaystyle \frac{1}{3p^3}\left\{[1-F(T_{max})]^3-[1-F(0)]^3\right\}=\frac{1}{3}\\ \pi_{x1}&\displaystyle =&\displaystyle P(X_{1k}<X_{2l})=\int_{-\infty}^{\infty}G(x)dG(x)=\frac{1}{2}\left[G(x)^2\right]_{-\infty}^{\infty}=\frac{1}{2}\\ ~~\pi_{x2}&\displaystyle =&\displaystyle P(X_{1k}<X_{2l}, X_{1k'}<X_{2l})=\int_{-\infty}^{\infty}G(t)^2dG(t)=\frac{1}{3}\left[G(x)^3\right]_{-\infty}^{\infty}=\frac{1}{3}\\ \pi_{x3}&\displaystyle =&\displaystyle P( X_{1k}<X_{2l}, X_{1k}<X_{2l'})\int_{-\infty}^{\infty}[1-G(t)]^2dG(t)\\ &\displaystyle =&\displaystyle -\frac{1}{3}\left\{[1-G(x)]^3\right\}_{-\infty}^{\infty}=\frac{1}{3}. \end{array} \end{aligned} $$Therefore,
$$\displaystyle \begin{aligned} \begin{array}{rcl} \pi_{U1}&\displaystyle =&\displaystyle p_1p_2\pi_{t1}+p_1q_2+q_1q_2\pi_{x1}\\&\displaystyle =&\displaystyle \frac{1}{2}p^2+pq+\frac{1}{2}q^2=\frac{1}{2}(p+q)^2=\frac{1}{2}\\ \pi_{U2}&\displaystyle =&\displaystyle p_1^2q_2+p_1^2p_2\pi_{t2}+2p_1q_1q_2\pi_{x1}+q_1^2q_2\pi_{x2}\\ &\displaystyle =&\displaystyle p^2q+\frac{1}{3}p^3+pq^2+\frac{1}{3}q^3=\frac{1}{3}(p+q)^3=\frac{1}{3}\\ \pi_{U3}&\displaystyle =&\displaystyle p_1q_2^2+p_1p_2^2\pi_{t3}+2p_1p_2q_2\pi_{x1}+q_1q_2^2\pi_{x3}\\ &\displaystyle =&\displaystyle pq^2+\frac{1}{3}p^3+p^2q+\frac{1}{3}q^3=\frac{1}{3}(p+q)^3=\frac{1}{3}. \end{array} \end{aligned} $$The mean and variance become
$$\displaystyle \begin{aligned} \begin{array}{rcl} \mu_0&\displaystyle =&\displaystyle E_0(U) =\pi_{U1}=\frac{1}{2};\\ \sigma^2_0&\displaystyle =&\displaystyle Var_0(U)\\ &\displaystyle =&\displaystyle (N_1N_2)^{-1}\left[\pi_{U1}\left(1-\pi_{U1}\right)+(N_1-1)\left(\pi_{U2}-\pi_{U1}^2\right)\right.\\ &\displaystyle &\displaystyle \qquad \qquad \left.+(N_2-1)\left(\pi_{U3}-\pi_{U1}^2\right)\right] \\ &\displaystyle =&\displaystyle (N_1N_2)^{-1}\left[\frac{1}{2}\left(1-\frac{1}{2}\right)+(N_1-1)\left(\frac{1}{3}-\left(\frac{1}{2}\right)^2\right)\right.\\ &\displaystyle &\displaystyle \qquad \qquad \left.+(N_2-1)\left(\frac{1}{3}-\left(\frac{1}{2}\right)^2\right)\right] \\ &\displaystyle =&\displaystyle (N_1N_2)^{-1}\left[\frac{1}{4}+\frac{1}{12}(N_1-1)+\frac{1}{12}(N_2-1)\right] =\frac{N_1+N_2+1}{12N_1N_2}. \end{array} \end{aligned} $$ -
2.
Ties are present: More generally, we can approximate the probabilities \(\pi _{U2}=E(U_{kl} U_{k'l})\) and \(\pi _{U3}=E(U_{kl} U_{kl'})\) using their unbiased estimators.
Following Hanley and McNeil (1982), we can show that the variance V ar(U) can be estimated by:
$$\displaystyle \begin{aligned}(N_1N_2)^{-1}\left[\widehat\pi_{U1}\left(1-\widehat\pi_{U1}\right)+(N_1-1)(\widehat\pi_{U2}-\widehat\pi_{U1}^2)+(N_2-1)(\widehat\pi_{U3}-\widehat\pi_{U1}^2)\right]\end{aligned}$$where \(\widehat \pi _{U1}=\displaystyle (N_1N_2)^{-1}\sum _{k=1}^{N_1} \sum _{l=1}^{N_2} U_{kl}, ~\widehat \pi _{U2}=\displaystyle (N_1N_2^2)^{-1}\sum _{k=1}^{N_1} U_{k\bullet }^2,~\) and \(~\widehat \pi _{U3}=\displaystyle (N_1^2N_2)^{-1}\sum _{l=1}^{N_2} U_{\bullet l}^2.\) In absence of ties, \(\widehat \pi _{U2}\) and \(\widehat \pi _{U3}\) are, respectively, estimates of π U3 and π U3.
One can also consider other possible approximations of the variance of U using the exposition provided by Newcombe (2006).
As we know,
$$\displaystyle \begin{aligned}P(\widetilde{X}_{1k}<\widetilde{X}_{2l})+P(\widetilde{X}_{1k}>\widetilde{X}_{2l})+P(\widetilde{X}_{1k}=\widetilde{X}_{2l})=1.\end{aligned}$$Under the null hypothesis, i.e., \(\widetilde {X}_{1k}\) and \(\widetilde {X}_{2l}\) are identically distributed, we have \(P(\widetilde {X}_{1k}<\widetilde {X}_{2l})=P(\widetilde {X}_{1k}>\widetilde {X}_{2l})\) which implies \(P(\widetilde {X}_{1k}<\widetilde {X}_{2l})+\frac {1}{2}P(\widetilde {X}_{1k}=\widetilde {X}_{2l})=\frac {1}{2}.\) Therefore,
$$\displaystyle \begin{aligned}E(U) =E(U_{kl})=P(\widetilde{X}_{1k}<\widetilde{X}_{2l})+\frac{1}{2}P(\widetilde{X}_{1k}=\widetilde{X}_{2l})=\frac{1}{2}.\end{aligned}$$The variance reduces to:
$$\displaystyle \begin{aligned} \begin{array}{rcl} \sigma^2_0&\displaystyle =&\displaystyle Var_0(U)=\frac{1}{12N_1N_2}\left( N_1+N_2+1-\frac{\displaystyle\sum_{\nu=1}^{g}t_{\nu}(t_{\nu}^2-1)}{(N_1+N_2)(N_1+N_2-1)}\right) \end{array} \end{aligned} $$where t ν is the number of observations with the same value in the ν-th block of tied observations sharing the same value and g is the number of such blocks (see, for instance, Rosner 2015).
Appendix 2: Mean and Variance of the Weighted U-Statistic
Consider the weights w = (w 1, w 2), we define the vector \(\mathbf {c}'=(c_1, c_2, c_3)=\left (w_1^2, w_1w_2, w_2^2\right )\). Let \(\displaystyle \widetilde {X}_{1k}=w_1\delta _{1k}(\eta +t_{1k})+w_2(1-\delta _{1k}) X_{1k},\) for k = 1, …, N 1 and \(\widetilde {X}_{2l}=w_1\delta _{2l}(\eta +t_{2l})+w_2(1-\delta _{2l})X_{2l},\) for l = 1, …, N 2. We define the weighted WMW U-statistic by: c ′ U = (U t, U tx, U x) where U ′ = (U t, U tx, U x) and
In absence of ties, the variance \(Var(\mathbf {U})={\varSigma } = (N_1N_2)^{-1}\left (\varSigma _{ij}\right )_{\substack {1\leq i, j\leq 3}}\) is a 3 × 3 matrix such that
Therefore,
Under the null hypothesis of no difference between the two groups, with respect to both survival and nonfatal outcome, we have p 1 = p 2 = p, q 1 = q 2 = q = 1−p, π t1 = π x1 = 1∕2, and π t2 = π x2 = π t3 = π x3 = 1∕3.Thus,
where \({\varSigma _0}=(N_1N_2)^{-1}\left (\varSigma _{0ij}\right )_{\substack {1\leq i, j\leq 3}}\) is a symmetric matrix with
Moreover, since V ar 0(c ′ U) = c ′Σ 0 c ≥ 0 by definition, the matrix Σ 0 is positive semi-definite. In practice, p is estimated by the pooled sample proportion \(\hat p=(N_1\widehat p_1+N_2\widehat p_2)/(N_1+N_2)\), and both E 0(U) and V ar 0(U) are calculated accordingly.
Finally, when ties are present, the foregoing formulas can be modified easily as we did in the non-weighted case to account for the ties in the variance estimations.
Appendix 3: Optimal Weights
From Eq. (1.15), we have
where \(\boldsymbol {\mu }'=\left (\pi _{t1}p_1p_2-\frac {1}{2}p^2, p_1q_2-pq , \pi _{x1}q_1q_2-\frac {1}{2}q^2\right ), \mathbf {c}'=(c_1, c_2, c_3)\) with c 1 + 2c 2 + c 3 = 1.
We assume that det(Σ 0) > 0, i.e., Σ 0 is positive definite. Maximizing \(\displaystyle \frac {|\mu _{1w}-\mu _{0w}|}{\sigma _{0w}},\) subject to c 1 + 2c 2 + c 3 = 1, with respect to c corresponds to maximizing the Lagrange function:
with respect to the vector c and λ, where λ is the Lagrange multiplier and b ′ = (1, 2, 1). Let \(K(\mathbf {c})=sign(\mathbf {c}'\mu )[( \mathbf {c}'\varSigma _0 \mathbf {c})^{-\frac {3}{2}}]\), we have
From (1.18) and (1.19), we have
because both (c ′Σ 0 c) and (c ′μ) are scalars and c ′ b = c 1 + 2c 2 + c 3 = 1.
Then, Eq. (1.18) implies (c ′Σ 0 c)μ = (Σ 0 c)(c ′μ), i.e., \(\displaystyle \mu =(\varSigma _0 \mathbf {c})\frac {( \mathbf {c}'\mu )}{( \mathbf {c}'\varSigma _0 \mathbf {c})}=\varSigma _0\frac {( \mathbf {c}'\mu )}{( \mathbf {c}'\varSigma _0 \mathbf {c})}\mathbf {c}.\) Since we assume that the matrix \(\varSigma _0^{-1}\) exists, this implies
and thus, \(\displaystyle \mathbf {b}'\varSigma _0^{-1}\mu =\frac {( \mathbf {c}'\mu )}{( \mathbf {c}'\varSigma _0 \mathbf {c})}\mathbf {b}'\mathbf {c}=\frac {( \mathbf {c}'\mu )}{( \mathbf {c}'\varSigma _0 \mathbf {c})}\).
Replacing \(\displaystyle \frac {( \mathbf {c}'\mu )}{( \mathbf {c}'\varSigma _0 \mathbf {c})}\) by \(\displaystyle \mathbf {b}'\varSigma _0^{-1}\mu \) in Eq. (1.20) yields \(\displaystyle \varSigma _0^{-1}\mu =\displaystyle (\mathbf {b}'\varSigma _0^{-1}\mu )\mathbf {c}.\) Therefore, the optimal weight-vector is
as long as \(\mathbf {b}'\varSigma _0^{-1}\boldsymbol {\mu }\neq 0\). In addition,
Since Σ 0 is positive definite, we can show that the border-preserving principal minors of order k > 2 have sign (−1)k. Therefore, \( \displaystyle {\mathbf {c}}_{opt}=\frac {\varSigma _0^{-1}\mu }{\mathbf {b}'\varSigma _0^{-1}\boldsymbol {\mu }}\) maximizes O(c).
Let us define two vectors d 1 ′ = (1, 1, 0) and d 2 ′ = b ′−d 1 ′ = (0, 1, 1). To calculate w 1 and w 2, we just need to consider the relationships \(\mathbf {c}=(w_1^2, w_1w_2, w_2^2)\) and w 1 + w 2 = 1. We have \(\mathbf {d_1}'\mathbf {c}=w_1^2+w_1(1-w_1)=w_1.\) Therefore, using the result given in Eq. (1.21), we can deduce \(\displaystyle w_1=\mathbf {d_1}'\mathbf {c}=\frac {\mathbf {d_1}'\varSigma _0^{-1}\mu }{\mathbf {b}'\varSigma _0^{-1}\boldsymbol {\mu }}\) and \(\displaystyle w_2=1-\mathbf {d_1}'\mathbf {c}=\frac {(\mathbf {b}'-\mathbf {d_1}')\varSigma _0^{-1}\mu }{\mathbf {b}'\varSigma _0^{-1}\boldsymbol {\mu }}=\frac {\mathbf {d_2}'\varSigma _0^{-1}\mu }{\mathbf {b}'\varSigma _0^{-1}\boldsymbol {\mu }}.\)
Appendix 4: Conditional Probabilities
1.1 Exponential Distribution
Suppose that the death times t 1, t 2 follow exponential distributions with hazards λ 1, λ 2, respectively, and denote \(\displaystyle \theta =\frac {\lambda _1}{\lambda _2},~~q_1=q_2^{\theta }\), and \(q_2=e^{-T\lambda _2}.\) Given that P(δ 1k = 1) = p 1, P(δ 2l = 1) = p 2, we have
1.2 Normal Distribution
Suppose that the nonfatal outcomes X 1, X 2 follow normal distributions \(N(\mu _{x_1}, \sigma _{x_1})\) and \(N(\mu _{x_2}, \sigma _{x_2})\), respectively.
Consider \(\displaystyle \varDelta _{x}{=}\frac {\mu _{x_2}-\mu _{x_1} }{\sqrt {\sigma _{x_1}^2+\sigma _{x_2}^2}}~\), \(\displaystyle \rho _{x_j}{=}\frac {\sigma _{x_j}^2}{\sigma _{x_1}^2+\sigma _{x_2}^2} \), and \(\displaystyle Z_{kl}= \frac {X_{1k}-X_{2l}-(\mu _{x_1}-\mu _{x_2})}{\sqrt {\sigma _{x_1}^2+\sigma _{x_2}^2}}\).
We can show that
\( (Z_{kl},Z_{k'l})~\sim N\left ( \left (\begin {array}{l} 0\\ 0 \end {array}\right ) , \left (\begin {array}{lr} 1&\rho _{x_2}\\ \rho _{x_2}&1 \end {array}\right )\right ) \)and \(\allowdisplaybreaks (Z_{kl},Z_{kl'})~\sim N\left ( \left (\begin {array}{l} 0\\ 0 \end {array}\right ) , \left (\begin {array}{lr} 1&\rho _{x_1}\\ \rho _{x_1}&1 \end {array}\right )\right ). \)
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Matsouaka, R.A., Singhal, A.B., Betensky, R.A. (2018). Optimal Weighted Wilcoxon–Mann–Whitney Test for Prioritized Outcomes. In: Zhao, Y., Chen, DG. (eds) New Frontiers of Biostatistics and Bioinformatics. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-99389-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-99389-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99388-1
Online ISBN: 978-3-319-99389-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)