Inference Guided Data Exploration
Abstract
We consider comparing two treatments using a given hypothesis test on the full sample and on all possible subsets, and we separately consider restricting the subsets considered to be those defined based on half-intervals of a covariate. Rather than treating this as a family of hypothesis tests, we instead choose the minimum p-value from the group of hypothesis tests as our test statistic. Simulation is employed to find an approximate critical value to control the type I error for our novel test statistic. These techniques may be used as a rule of thumb for judging the potential significance of a result after a “fishing expedition” has been caried out on a dataset, i.e., a large number of tests of hypothesis were performed on subsets of the data or a subset was selected after inspecting the data. When the technique is restricted to subsets defined based on half-intervals of a covariate, it may be useful as a planned methodology for analyzing an experiment.
Keywords and phrases
Multiple subset testing selecting population fishing expeditionPreview
Unable to display preview. Download preview PDF.
References
- 1.Fleiss, J. L. (1986). The Design and Analysis of Clinical Experiments, John Wiley & Sons, New York.MATHGoogle Scholar
- 2.Hsu, J. C. (1996). Multiple Comparisons, Chapman and Hall, New York.MATHGoogle Scholar
- 3.Koziol, J. A. and Wu, S. H. (1996). Changepoint statistics for assessing a treatment-vovariate interaction, Biometrics, 52, 1147–1152.MATHCrossRefGoogle Scholar
- 4.Mamounas, E. P. (1997). NSABP Protocol B-27: Preoperative doxorubicin plus cyclophosphamide followed by preoperative or postoperative docetaxel, Oncology, 11 (Suppl. No. 6), 37–40.Google Scholar
- 5.Miller R. G. (1981). Simultaneous Statistical Inference, Springer-Verlag, New York.MATHGoogle Scholar
- 6.Potthoff, R. F. (1964). On the Johnson-Neyman technique and some extensions thereof, Psychometrika, 29, 241–256.CrossRefGoogle Scholar
- 7.Worsley, K. J. (1992). A three dimensional statistical analysis for CBF activation studies in human brain, Journal of Cerebral Blood Flow and Metabolism, 12, 900–918.Google Scholar
- 8.Yothers, G. (2003). Methodologies for Identifying Subsets of the Population Where Two Treatments Differ, Ph.D. Dissertation, University of Pittsburgh, Pittsburgh, Pennsylvania.Google Scholar