Inference Guided Data Exploration

  • Greg Yothers
  • Allan R. Sampson
Part of the Statistics for Industry and Technology book series (SIT)

Abstract

We consider comparing two treatments using a given hypothesis test on the full sample and on all possible subsets, and we separately consider restricting the subsets considered to be those defined based on half-intervals of a covariate. Rather than treating this as a family of hypothesis tests, we instead choose the minimum p-value from the group of hypothesis tests as our test statistic. Simulation is employed to find an approximate critical value to control the type I error for our novel test statistic. These techniques may be used as a rule of thumb for judging the potential significance of a result after a “fishing expedition” has been caried out on a dataset, i.e., a large number of tests of hypothesis were performed on subsets of the data or a subset was selected after inspecting the data. When the technique is restricted to subsets defined based on half-intervals of a covariate, it may be useful as a planned methodology for analyzing an experiment.

Keywords and phrases

Multiple subset testing selecting population fishing expedition 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fleiss, J. L. (1986). The Design and Analysis of Clinical Experiments, John Wiley & Sons, New York.MATHGoogle Scholar
  2. 2.
    Hsu, J. C. (1996). Multiple Comparisons, Chapman and Hall, New York.MATHGoogle Scholar
  3. 3.
    Koziol, J. A. and Wu, S. H. (1996). Changepoint statistics for assessing a treatment-vovariate interaction, Biometrics, 52, 1147–1152.MATHCrossRefGoogle Scholar
  4. 4.
    Mamounas, E. P. (1997). NSABP Protocol B-27: Preoperative doxorubicin plus cyclophosphamide followed by preoperative or postoperative docetaxel, Oncology, 11 (Suppl. No. 6), 37–40.Google Scholar
  5. 5.
    Miller R. G. (1981). Simultaneous Statistical Inference, Springer-Verlag, New York.MATHGoogle Scholar
  6. 6.
    Potthoff, R. F. (1964). On the Johnson-Neyman technique and some extensions thereof, Psychometrika, 29, 241–256.CrossRefGoogle Scholar
  7. 7.
    Worsley, K. J. (1992). A three dimensional statistical analysis for CBF activation studies in human brain, Journal of Cerebral Blood Flow and Metabolism, 12, 900–918.Google Scholar
  8. 8.
    Yothers, G. (2003). Methodologies for Identifying Subsets of the Population Where Two Treatments Differ, Ph.D. Dissertation, University of Pittsburgh, Pittsburgh, Pennsylvania.Google Scholar

Copyright information

© Birkhäuser Boston 2005

Authors and Affiliations

  • Greg Yothers
    • 1
    • 2
  • Allan R. Sampson
    • 1
    • 2
  1. 1.National Surgical Adjuvant Breast and Bowel Project (NSABP)PittsburghUSA
  2. 2.Department of StatisticsUniversity of PittsburghUSA

Personalised recommendations