Abstract
Weighted logrank tests are the usual tool for detecting late effects in clinical trials. Weights determine the alternative hypotheses against which the tests are optimal. Choosing a specific weight is thus a crucial issue in practice. One common weight was introduced in 1982 by Harrington and Fleming. The corresponding test is implemented in standard statistical softwares packages. However, using this test in randomized controlled clinical trials raises two major and still unsolved difficulties. First, the weight depends on a parameter q that has to be set before collecting the data. Second, the necessary sample size depends on this q. This article addresses these difficulties. We provide the explicit form of the alternative hypothesis under which the Fleming–Harrington test for late effects is optimal in terms of Pitman’s asymptotic relative efficiency. Using simulations, we investigate various aspects of the Fleming–Harrington test for late effects, such as power properties and sensitivity to the value of q. We also investigate the relation between q and the necessary sample size for the Fleming–Harrington test. Based on these results, we propose q = 3 as a general choice for testing late effects. We illustrate our methodology on a data set arising from a prevention trial in the field of dementia.
Similar content being viewed by others
References
Andrieu, S., N. Coley, S. Lovestone, P. S. Aisen, and B. Vellas. 2015. Prevention of sporadic Alzheimer’s disease: Lessons learned from clinical trials and future directions. Lancet Neurology 14 (9):926–44.
Andrieu, S., S. Gillette, K. Amouyal, F. Nourhashemi, E. Reynish, P. J. Ousset, J. L. Albarede, B. Vellas, and H. Grandjean. 2003. Association of Alzheimer’s disease onset with ginkgo biloba and other symptomatic cognitive treatments in a population of women aged 75 years and older from the EPIDOS study. Journals of Gerontology Series A, Biological Sciences and Medical Sciences 58 (4):372–77.
Andrieu, S., P. J. Ousset, N. Coley, M. Ouzid, H. Mathiex-Fortunet, and B. Vellas. 2008. GuidAge study: A 5-year double blind, randomised trial of EGb 761 for the prevention of Alzheimer’s disease in elderly subjects with memory complaints. I. Rationale, design and baseline data. Current Alzheimer Research 5(4):406–15.
Billingsley, P. 1999. Convergence of probability measures, 2nd ed. Wiley Series in Probability and Statistics: Probability and Statistics. New York, NY: John Wiley & Sons.
Breslow, N. E., L. Edler, and J. Berger. 1984. A two-sample censored-data rank test for acceleration. Biometrics 40(4):1049–62.
Brookmeyer, R. 2007. Forecasting the global burden of Alzheimer’s disease. Alzheimer’s and Dementia 3(3):186–91.
Brookmeyer, R., S. Gray, and C. Kawas. 1998. Projections of Alzheimer’s disease in the United States and the public health impact of delaying disease onset. American Journal of Public Health 88 (9):1337–42.
Buyske, S., R. Fagerstrom, and Z. Ying. 2000. A class of weighted log-rank tests for survival data when the event is rare. Journal of the American Statistical Association 95 (449):249–58.
Cox, D. R. 1972. Regression models and life-tables. Journal of the Royal Statistical Society Series B 34:187–220.
DeKosky, S. T. 2008. Ginkgo biloba for prevention of dementia: a randomized controlled trial. Journal of the American Medical Association 300(19):2253–62.
Eng, K. H., and M. R. Kosorok. 2005. A sample size formula for the supremum log-rank statistic. Biometrics 61(1):86–91.
Fleming, T. R., and D. P. Harrington. 1991. Counting processes and survival analysis. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. New York, NY: John Wiley & Sons.
Fleming, T. R., D. P. Harrington, and M. O’Sullivan. 1987. Supremum versions of the log-rank and generalized Wilcoxon statistics. Journal of the American Statistical Association 82 (397):312–20.
Garès, V. 2014. Améliorer la performance des analyses de survie dans le cadre des essais de prévention et application à la maladie dAlzheimer. PhD thesis, Université de Toulouse, Toulouse, France. https://doi.org/thesesups.ups-tlse.fr/2393/1/2014TOU30048.pdf
Garès, V., S. Andrieu, J.-F. Dupuy, and N. Savy. 2013. Comparison of constant piecewise weighted test and Fleming Harrington’s test — Application in clinical trials. Electronic Journal of Statistics 8 (1):841–860.
Garès, V., S. Andrieu, J.-F. Dupuy, and N. Savy. 2015. An omnibus test for several hazard alternatives in prevention randomized controlled clinical trials. Statistics in Medicine 34 (4):541–57.
Gastwirth, J. L. 1985. The use of maximin efficiency robust tests in combining contingency tables and survival analysis. Journal of the American Statistical Association 80 (390):381–84.
Gehan, E. A. 1965. A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52:203–23.
Gill, R. 1980. Censoring and stochastic integrals. Mathematical Centre Tracts 124. Amsterdam, The Netherlands: Mathematisch Centrum.
Halperin, M., E. Rogot, J. Gurian, and F. Ederer. 1967. Sample sizes for medical trials with special reference to long-term therapy. Biometrics 21:13–24.
Harrington, D. P., and T. R. Fleming. 1982. A class of rank test procedures for censored survival data. Biometrika 69 (3):553–66.
Jung, S. H. 2008. Sample size calculation for the weighted rank statistics with paired survival data. Statistics in Medicine 27 (17):3350–65.
Kosorok, M. R. 2008. Introduction to empirical processes and semiparametric inference. Springer Series in Statistics. New York, NY: Springer.
Kosorok, M. R., and C. Y. Lin. 1999. The versatility of function-indexed weighted log-rank statistics. Journal of the American Statistical Association 94 (445):320–32.
Lai, T. L., and Z. Ying. 1991. Rank regression methods for left-truncated and right-censored data. Annals of Statistics 19 (2):531–56.
Lakatos, E., and K. G. Lan. 1992. A comparison of sample size methods for the logrank statistic. Statistics in Medicine 11:179–91.
Lee, J. W. 1996. Some versatile tests based on the simultaneous use of weighted log-rank statistics. Biometrics 52 (2):721–25.
Lyketsos, C. G. 2007. Naproxen and celecoxib do not prevent Alzheimer’s disease in early results from a randomized controlled trial. Neurology 68 (21):1800–1808.
Machin, D., M. J. Campbell, T. S. Beng, and T. S. Huey. 2009. Sample size tables for clinical studies. New York, NY: John Wiley & Sons.
Mantel, N., and W. Haenszel. 1959. Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer 22:719–48.
Martinussen, T. and T. H. Scheike. 2006. Dynamic regression models for survival data. Statistics for Biology and Health. New York, NY: Springer.
Pecková, M. and T. R. Fleming. 2003. Adaptive test for testing the difference in survival distributions. Lifetime Data Analysis 9 (3):223–38.
Peto, R., and J. Peto. 1972. Asymptotically efficient rank invariant test procedures. Journal of the Royal Statistical Society Series A 135:185–206.
Prentice, R. L. 1978. Linear rank tests with right censored data. Biometrika 65 (1):167–79.
Scherrer, B., S. Andrieu, P. J. Ousset, G. Berrut, J. F. Dartigues, B. Dubois, F. Pasquier, F. Piette, P. Robert, J. Touchon, P. Garnier, H. Mathiex-Fortunet, B. Vellas, and the GuidAge Study Group. 2015. Analysing time to event data in dementia prevention trials: The example of the GuidAge study of EGb761. Journal of Nutrition Health and Aging 19 (10):1009–11.
Schork, M. A., and R. D. Remington. 1967. The determination of sample size in treatment-control comparisons for chronic disease studies in which noncompliance or nonadherence is a problem. Journal of Chronic Diseases 20:233–39.
Self, S. G. 1991. An adaptive weighted log-rank test with application to cancer prevention and screening trials. Biometrics 47 (3):975–86.
Shumaker, S. A. 2003. Estrogen plus progestin and the incidence of dementia and mild cognitive impairment in postmenopausal women: The Women’s Health Initiative Memory Study: A randomized controlled trial. Journal of the American Medical Association 289 (20):2651–62.
Shumaker, S. A. 2004. Conjugated equine estrogens and incidence of probable dementia and mild cognitive impairment in postmenopausal women: Women’s Health Initiative Memory Study. Journal of the American Medical Association 291 (24):2947–58.
Tarone, R. E., and J. Ware. 1977. On distribution-free tests for equality of survival distributions. Biometrika 64 (1):156–60.
Van der Vaart, A. W. 1998. Asymptotic statistics, Vol. 3 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge, UK: Cambridge University Press.
Vellas, B., N. Coley, P. J. Ousset, G. Berrut, J. F. Dartigues, B. Dubois, H. Grandjean, F. Pasquier, F. Piette, P. Robert, J. Touchon, P. Garnier, H. Mathiex-Fortunet, and S. Andrieu for the GuidAge Study Group. 2012. Long-term use of standardised ginkgo biloba extract for the prevention of Alzheimer’s disease (GuidAge): A randomised placebo-controlled trial. Lancet Neurology 11:851–59.
Wallenstein, S., and A. Berger. 1997. Weighted logrank tests to detect a transient improvement in survivorship. Biometrics 53 (2):736–44.
Wimo, A., and M. Prince. 2010. World Alzheimer report 2010: The global economic impact of dementia. London, UK: Alzheimer’s Disease International.
Wu, L., and P. B. Gilbert. 2002. Flexible weighted log-rank tests optimal for detecting early and/or late survival differences. Biometrics 58 (4):997–1004.
Yang, S., and R. Prentice. 2010. Improved logrank-type tests for survival data using adaptive weights. Biometrics 66 (1):30–38.
Zucker, D. M. 1992. The efficiency of a weighted log-rank test under a percent error misspecification model for the log hazard ratio. Biometrics 48 (3):893–899.
Zucker, D. M., and E. Lakatos. 1990. Weighted log rank type statistics for comparing survival curves when there is a time lag in the effectiveness of treatment. Biometrika 77 (4):853–64.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supplemental data for this article can be accessed on the publisher’s website.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Garès, V., Andrieu, S., Dupuy, JF. et al. On the Fleming—Harrington test for late effects in prevention randomized controlled trials. J Stat Theory Pract 11, 418–435 (2017). https://doi.org/10.1080/15598608.2017.1295889
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1080/15598608.2017.1295889
Keywords
- Hypothesis test
- survival data analysis
- weighted logrank tests
- asymptotic relative efficiency
- sample size calculation
- prevention trial