Abstract
A typical microarray experiment often involves comparisons of hundreds or thousands of genes. Since a large number of genes are compared, simple use of a significance test without adjustment for multiple comparison artifacts could lead to a large chance of false positive findings. In this context, Tsai et al. (Biometrics 59:1071–1081, 2003) have presented a model that studies the overall error rate when testing multiple hypotheses. This model involves the distribution of the sum of non-independent Bernoulli trials and this distribution is approximated by using a beta-binomial structure. Instead of using a beta-binomial model, in this paper, we derive the exact distribution of the sum of non-independent and non-identically distributed Bernoulli random variables. The distribution obtained is used to compute the conditional false discovery rates and the results are compared to those obtained, in Table 3, by Tsai et al. (Biometrics 59:1071–1081, 2003).
Similar content being viewed by others
References
Altham PME (1978) Two generalization of the binomial distribution. Appl Stat 27(2): 162–167
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57: 289–300
Benjamini Y, Liu W (1999) A step down multiple hypotheses testing procedure that controls the false discovery rate under independence. J Stat Plan Inf 82: 163–170
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependence. Ann Stat 29: 1165–1188
Bowman D, George EO (1995) A saturated model for analyzing exchangeable binary data: applications to clinical and developmental toxicity studies. J Am Stat Assoc 90: 871–879
George EO, Bowman D (1995) A full likelihood procedure for analyzing exchangeable binary data. Biometrics 51: 512–523
George EO, Kodell RL (1996) Tests of independence, treatment heterogeneity, and dose-related trend with exchangeable binary data. J Am Stat Assoc 91: 1602–1610
Lancaster HO (1969) The Chi-squared distribution. Wiley, London
Storey JD (2002) A direct approach to false discovery rates. J R Stat Soc Ser B 64: 479–498
Tsai C, Hsueh H, Chen JJ (2003) Estimation of false discovery rates in multiple testing: application to gene microarray data. Biometrics 59: 1071–1081
Yu C, Zelterman D (2002) Sums of dependent Bernoulli random variables and disease clustering. Stat Probab Lett 57: 363–373
Zelterman D (2004) Discrete distribution: applications in the health sciences. Wiley, New York
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gupta, R.C., Tao, H. A generalized correlated binomial distribution with application in multiple testing problems. Metrika 71, 59–77 (2010). https://doi.org/10.1007/s00184-008-0202-7
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-008-0202-7