Abstract
Time-course microarray experiments harvested samples at several time points. To reveal the dynamic gene expression changes over time, we need to identify the significant genes and detect the patterns of gene expressions, which may bring directional errors. Guo et al. (Biometrics 66(2):485–492, 2010) introduced a mixed directional false discovery rate (mdFDR) controlled procedure, which controls the sum of expected proportions of Type I and Type III errors among all rejections. In this paper, we develop weighted p value procedures for mdFDR control and give out some sufficient conditions to assure the (asymptotic) mdFDR control. Some weights and their estimators are illustrated to satisfy the sufficient conditions. The proposed weighted p value procedures are compared with the existing method by extensive simulations. Based on the proposed weighted p values procedure, we provide multiple CIs which control the false coverage-statement rate (FCR). We use the proposed methods to analyze the time-course microarray data studied in Lobenhofer et al. (Mol Endocrinol 16:1215–1229, 2002). Most of our findings are the same as those obtained by the existing method. In addition, we identify some other important genes, such as CDKN3 and NQO1.
Similar content being viewed by others
References
Arbeitman M, Furlong E, Imam F, Johnson E, Null B, Baker B, Krasnow M, Scott M, Davis R, White K (2002) Gene expression during the life cycle of drosophila melanogaster. Science 297(5590):2270–2275
Asher G, Lotem J, Kama R, Sachs L, Shaul Y (2002) NQO1 stabilizes p53 through a distinct pathway. Proc Nat Acad Sci 99(1):3099–3104
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57(1):289–300
Benjamini Y, Hochberg Y (1997) Multiple hypotheses testing with weights. Scand J Stat 24(3):407–418
Benjamini Y, Hochberg Y (2000) On the adaptive control of the false discovery rate in multiple testing with independent statistics. J Educ Behav Stat 25(1):60–83
Benjamini Y, Yekutieli D (2005) False discovery rate-adjusted multiple confidence intervals for selected parameters. J Am Stat Assoc 100:71–80
Benjamini Y, Krieger A, Yekutieli D (2006) Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93(3):491–507
Blanchard G, Roquain E (2009) Adaptive fdr control under independence and dependence. J Mach Learn 29:2837–2871
Ernst J, Nau GJ, Bar-Joseph Z (2005) Clustering short time series gene expression data. Bioinformatics 21:i159–i168
Finner H, Gontscharuk V (2009) Controlling the familywise error rate with plug-in estimator for the proportion of true null hypotheses. J R Stat Soc B 71(5):1031–1048
Genovese C, Wasserman L (2004) A stochastic process approach to false discovery control. Ann Stat 32(3):1035–1061
Genovese C, Roeder K, Wasserman L (2006) False discovery control with \(p\) value weighting. Biometrika 93(3):509–524
Gui J, Tosteson TD, Borsuk M (2012) Weighted multiple testing procedures for genomic studies. BioData Min 5(1):4–13
Guillemin K, Salama N, Tompkins L, Falkow S (2002) Cag pathogenicity island-specific responses of gastric epithelial cells to helicobacter pylori infection. Proc Natl Acad Sci USA 99:15136–15141
Guo W, Sarkar SK, Peddada SD (2010) Controlling false discoveries in multidimensional directional decisions, with applications to gene expression data on ordered categories. Biometrics 66(2):485–492
Hu JX, Zhao H, Zhou H (2010) False discovery rate control with groups. J Am Stat Assoc 105(491):1215–1227
Jin J, Cai T (2007) Estimating the null and the proportion of nonnull effects in large-scale multiple comparisons. J Am Stat Assoc 102:495–506
Lee S, Reimer CL, Fang L, Iruela-Arispe LM, Aaronson SA (2000) Overexpression of kinase-associated phosphatase (KAP) in breast and prostate cancer and inhibition of the transformed phenotype by antisense kap expression. Mol Cell Biol 20(5):1723–1732
Lobenhofer E, Bennett L, Cable P, Li L, Bushel P, Afshari C (2002) Regulation of DNA replication fork genes by 17 betaestradiol. Mol Endocrinol 16:1215–1229
Meinshausen N, Rice J (2006) Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. Ann Stat 34(1):373–393
Peddada SD, Lobenhofer E, Li L, Afshari C, Weinberg C, Umbach D (2003) Gene selection and clustering for time-course and dose response microarray experiments using order-restricted inference. Bioinformatics 19:834–841
Roeder K, Wasserman L (2009) Genome-wide significance levels and weighted hypothesis testing. Stat Sci 24(4):398–413
Sarkar SK, Guo W, Finner H (2012) On adaptive procedures controlling the familywise error rate. J Stat Plan Inference 142(3):65–78
Simes R (1986) An improved Bonferroni procedure for multiple test of significance. Biometrika 73(3):751–754
Storey J (2002) A direct approach to false discovery rates. J R Stat Soc B 64(3):479–498
Sun W, Cai T (2007) Oracle and adaptive compound decision rules for false discovery rate control. J Am Stat Assoc 102:901–912
Tian B, Nowak D, Brasier A (2005) A TNF-Induced gene expression program under oscillatory NF-Kappab control. BMC Genom 6(73):137–137
Wang L, Ramoni M, Sebastiani P (2006) Clustering short gene expression profiles. Lect Notes Comput Sci 3909:60–68
Wang L, Montano M, Rarick M, Sebastiani P (2008) Conditional clustering of temporal expression profiles. BMC Bioinform 9:147–147
Acknowledgements
The authors are grateful to two anonymous referees and an editor for constructive comments and suggestions, which have led to substantial improvement in the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Haibing Zhao’s work was supported by a Grant from the National Natural Science Foundation of China (NSFC) (No. 11471204).
Appendix
Appendix
Proof of Theorem 1
We show the conclusion to be true following Guo et al. (2010). Without loss of generality, we suppose the non-zero components of \({\varvec{\mu }}_i=(\mu _{i1},\ldots ,\mu _{iq})\) are all positive. Then
where \(I(\cdot )\) is the indicator function and \(R^{(-i)}\) is the number of rejections not including the rejection of \(H_i\) in the BH step-up procedure with \(P_{ij}\) replaced by 0. Note that \(P_{ij}\) replaced by 0 leads to \(P_{i}=0\). The proof is completed. \(\square \)
Proof of Theorem 2
Without loss of generality, we suppose the non-zero components of \({\varvec{\mu }}_i\) are all positive. Then
where \(\mathbf{P}^{(-ij)}\) is the collection of \(P_{i^{\prime }j^{\prime }}, i^{\prime }=1,\ldots ,m, j^{\prime }=1,\ldots ,q,\) excluding \(P_{ij}\).
The proof is completed. \(\square \)
Proof of Theorem 3
Without loss of generality, we assume \(H_{ij}, i=1,\ldots ,m_{0j},\) to be true for each j. Note that, for \(i=1,\ldots ,m_{0j},\)
and
where Bin(a, b) is the binomial distribution with the parameter (a, b). Suppose \(m_{0j}>0\), then we have
where the subscript DU of the expectation means that it is calculated under the Dirac-uniform configuration, which assumes that the p values corresponding to the false null hypotheses are 0 and the p values corresponding to the true null hypotheses are i.i.d as U(0, 1). The last inequality in Equation (7.1) is true by Sarkar et al. (2012) and the last inequality but one follows from \(m_{0j}\ge (1-\lambda )m\widehat{\pi }_{0j,-1j}\) under the Dirac-uniform configuration. Thus, \(\widehat{w}^a_{ij}\) satisfy \(\sum _{i,j}(1+I(\mu _{ij}=0))E\widehat{w}^a_{ij,-ij}\le 2mq\). Obviously, \(\widehat{w}^a_{ij}\le \widehat{w}^a_{ij,-ij}\). By Theorem 2, the data-driven Apro1 can control the mdFDR. The proof is completed. \(\square \)
Proof of Theorem 5
Without loss of generality, we suppose the non-zero components of \({\varvec{\mu }}_i\) are all positive. Then
where \(\widehat{w}_{ij,i}=\sup _{0\le P_{ij}\le 1,j=1,\ldots ,q}\widehat{w}_{ij}\), and \(R^{(-i)}\) is the number of rejections not including the rejection of \(H_i\) by the first step of the GW procedure with \(p_{ij}\) replaced by 0. Then
The proof is completed. \(\square \)
Rights and permissions
About this article
Cite this article
Zhao, H., Fung, W.K. Controlling mixed directional false discovery rate in multidimensional decisions with applications to microarray studies. TEST 27, 316–337 (2018). https://doi.org/10.1007/s11749-017-0547-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-017-0547-1