Controlling mixed directional false discovery rate in multidimensional decisions with applications to microarray studies

Zhao, Haibing; Fung, Wing Kam

doi:10.1007/s11749-017-0547-1

Controlling mixed directional false discovery rate in multidimensional decisions with applications to microarray studies

Original Paper
Published: 23 June 2017

Volume 27, pages 316–337, (2018)
Cite this article

TEST Aims and scope Submit manuscript

Haibing Zhao^1,2 &
Wing Kam Fung³

161 Accesses
1 Citation
Explore all metrics

Abstract

Time-course microarray experiments harvested samples at several time points. To reveal the dynamic gene expression changes over time, we need to identify the significant genes and detect the patterns of gene expressions, which may bring directional errors. Guo et al. (Biometrics 66(2):485–492, 2010) introduced a mixed directional false discovery rate (mdFDR) controlled procedure, which controls the sum of expected proportions of Type I and Type III errors among all rejections. In this paper, we develop weighted p value procedures for mdFDR control and give out some sufficient conditions to assure the (asymptotic) mdFDR control. Some weights and their estimators are illustrated to satisfy the sufficient conditions. The proposed weighted p value procedures are compared with the existing method by extensive simulations. Based on the proposed weighted p values procedure, we provide multiple CIs which control the false coverage-statement rate (FCR). We use the proposed methods to analyze the time-course microarray data studied in Lobenhofer et al. (Mol Endocrinol 16:1215–1229, 2002). Most of our findings are the same as those obtained by the existing method. In addition, we identify some other important genes, such as CDKN3 and NQO1.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adjustment for Multiplicity

Methods of Analysis and Meta-Analysis for Identifying Differentially Expressed Genes

Fully moderated t-statistic in linear modeling of mixed effects for differential expression analysis

Article Open access 20 December 2019

References

Arbeitman M, Furlong E, Imam F, Johnson E, Null B, Baker B, Krasnow M, Scott M, Davis R, White K (2002) Gene expression during the life cycle of drosophila melanogaster. Science 297(5590):2270–2275
Article Google Scholar
Asher G, Lotem J, Kama R, Sachs L, Shaul Y (2002) NQO1 stabilizes p53 through a distinct pathway. Proc Nat Acad Sci 99(1):3099–3104
Article Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57(1):289–300
MathSciNet MATH Google Scholar
Benjamini Y, Hochberg Y (1997) Multiple hypotheses testing with weights. Scand J Stat 24(3):407–418
Article MathSciNet MATH Google Scholar
Benjamini Y, Hochberg Y (2000) On the adaptive control of the false discovery rate in multiple testing with independent statistics. J Educ Behav Stat 25(1):60–83
Article Google Scholar
Benjamini Y, Yekutieli D (2005) False discovery rate-adjusted multiple confidence intervals for selected parameters. J Am Stat Assoc 100:71–80
Article MathSciNet MATH Google Scholar
Benjamini Y, Krieger A, Yekutieli D (2006) Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93(3):491–507
Article MathSciNet MATH Google Scholar
Blanchard G, Roquain E (2009) Adaptive fdr control under independence and dependence. J Mach Learn 29:2837–2871
MATH Google Scholar
Ernst J, Nau GJ, Bar-Joseph Z (2005) Clustering short time series gene expression data. Bioinformatics 21:i159–i168
Article Google Scholar
Finner H, Gontscharuk V (2009) Controlling the familywise error rate with plug-in estimator for the proportion of true null hypotheses. J R Stat Soc B 71(5):1031–1048
Article MathSciNet Google Scholar
Genovese C, Wasserman L (2004) A stochastic process approach to false discovery control. Ann Stat 32(3):1035–1061
Article MathSciNet MATH Google Scholar
Genovese C, Roeder K, Wasserman L (2006) False discovery control with $p$ value weighting. Biometrika 93(3):509–524
Article MathSciNet MATH Google Scholar
Gui J, Tosteson TD, Borsuk M (2012) Weighted multiple testing procedures for genomic studies. BioData Min 5(1):4–13
Article Google Scholar
Guillemin K, Salama N, Tompkins L, Falkow S (2002) Cag pathogenicity island-specific responses of gastric epithelial cells to helicobacter pylori infection. Proc Natl Acad Sci USA 99:15136–15141
Article Google Scholar
Guo W, Sarkar SK, Peddada SD (2010) Controlling false discoveries in multidimensional directional decisions, with applications to gene expression data on ordered categories. Biometrics 66(2):485–492
Article MathSciNet MATH Google Scholar
Hu JX, Zhao H, Zhou H (2010) False discovery rate control with groups. J Am Stat Assoc 105(491):1215–1227
Article MathSciNet MATH Google Scholar
Jin J, Cai T (2007) Estimating the null and the proportion of nonnull effects in large-scale multiple comparisons. J Am Stat Assoc 102:495–506
Lee S, Reimer CL, Fang L, Iruela-Arispe LM, Aaronson SA (2000) Overexpression of kinase-associated phosphatase (KAP) in breast and prostate cancer and inhibition of the transformed phenotype by antisense kap expression. Mol Cell Biol 20(5):1723–1732
Article Google Scholar
Lobenhofer E, Bennett L, Cable P, Li L, Bushel P, Afshari C (2002) Regulation of DNA replication fork genes by 17 betaestradiol. Mol Endocrinol 16:1215–1229
Google Scholar
Meinshausen N, Rice J (2006) Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. Ann Stat 34(1):373–393
Article MathSciNet MATH Google Scholar
Peddada SD, Lobenhofer E, Li L, Afshari C, Weinberg C, Umbach D (2003) Gene selection and clustering for time-course and dose response microarray experiments using order-restricted inference. Bioinformatics 19:834–841
Article Google Scholar
Roeder K, Wasserman L (2009) Genome-wide significance levels and weighted hypothesis testing. Stat Sci 24(4):398–413
Article MathSciNet MATH Google Scholar
Sarkar SK, Guo W, Finner H (2012) On adaptive procedures controlling the familywise error rate. J Stat Plan Inference 142(3):65–78
Article MathSciNet MATH Google Scholar
Simes R (1986) An improved Bonferroni procedure for multiple test of significance. Biometrika 73(3):751–754
Article MathSciNet MATH Google Scholar
Storey J (2002) A direct approach to false discovery rates. J R Stat Soc B 64(3):479–498
Article MathSciNet MATH Google Scholar
Sun W, Cai T (2007) Oracle and adaptive compound decision rules for false discovery rate control. J Am Stat Assoc 102:901–912
Article MathSciNet MATH Google Scholar
Tian B, Nowak D, Brasier A (2005) A TNF-Induced gene expression program under oscillatory NF-Kappab control. BMC Genom 6(73):137–137
Article Google Scholar
Wang L, Ramoni M, Sebastiani P (2006) Clustering short gene expression profiles. Lect Notes Comput Sci 3909:60–68
Article MathSciNet MATH Google Scholar
Wang L, Montano M, Rarick M, Sebastiani P (2008) Conditional clustering of temporal expression profiles. BMC Bioinform 9:147–147
Article Google Scholar

Download references

Acknowledgements

The authors are grateful to two anonymous referees and an editor for constructive comments and suggestions, which have led to substantial improvement in the paper.

Author information

Authors and Affiliations

School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, 200433, China
Haibing Zhao
Shanghai Key Laboratory of Financial Information Technology, Shanghai University of Finance and Economics, Shanghai, 200433, China
Haibing Zhao
Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam Road, Hong Kong, China
Wing Kam Fung

Authors

Haibing Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Wing Kam Fung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haibing Zhao.

Additional information

Haibing Zhao’s work was supported by a Grant from the National Natural Science Foundation of China (NSFC) (No. 11471204).

Appendix

Proof of Theorem 1

We show the conclusion to be true following Guo et al. (2010). Without loss of generality, we suppose the non-zero components of ${\varvec{\mu }}_i=(\mu _{i1},\ldots ,\mu _{iq})$ are all positive. Then

$$\begin{aligned} \begin{aligned} {\text{ m }dFDR}&\le \sum _{r=1}^m\sum _{j=1}^q\frac{1}{r}\sum _{i=1}^m\bigg ( I(\mu _{ij}=0)Pr\left\{ P_{ij}\le \frac{w_{ij}r}{mq} \alpha , R=r \right\} \\&\quad +I(\mu _{ij}>0)Pr\left\{ P_{ij}\le \frac{w_{ij}r}{mq} \alpha , R=r, T_{ij}<0 \right\} \bigg )\\&\le \sum _{r=1}^m\sum _{j=1}^q\frac{1}{r}\sum _{i=1}^m\bigg ( I(\mu _{ij}=0) +\frac{1}{2} I(\mu _{ij}>0)\bigg )Pr\left\{ R^{(-i)}=r-1 \right\} \frac{w_{ij}r}{mq} \alpha \\&=\sum _{j=1}^q\sum _{i=1}^m\bigg ( I(\mu _{ij}=0)+\frac{1}{2} I(\mu _{ij}>0)\bigg ) \frac{w_{ij}}{mq} \alpha =\alpha , \end{aligned} \end{aligned}$$

where $I(\cdot )$ is the indicator function and $R^{(-i)}$ is the number of rejections not including the rejection of $H_i$ in the BH step-up procedure with $P_{ij}$ replaced by 0. Note that $P_{ij}$ replaced by 0 leads to $P_{i}=0$. The proof is completed. $\square $

Proof of Theorem 2

Without loss of generality, we suppose the non-zero components of ${\varvec{\mu }}_i$ are all positive. Then

$$\begin{aligned} \begin{aligned} mdFDR&\le \sum _{r=1}^m\sum _{j=1}^q\frac{1}{r}\sum _{i=1}^m\bigg ( I(\mu _{ij}=0)Pr \{P_{ij}\le \frac{ \widehat{w}_{ij}r}{mq} \alpha , R=r \}\\&\quad +I(\mu _{ij}>0)Pr\left\{ P_{ij}\le \frac{ \widehat{w}_{ij}r}{mq} \alpha , R=r, T_{ij}<0 \right\} \bigg )\\&\le \sum _{r=1}^m\sum _{j=1}^q\frac{1}{r}\sum _{i=1}^m\bigg ( I(\mu _{ij}=0)Pr\{P_{ij} \le \frac{ \widehat{w}_{ij,-ij}r}{mq} \alpha , R^{(-i)}=r-1 \}\\&\quad +I(\mu _{ij}>0)Pr\left\{ P_{ij}\le \frac{ \widehat{w}_{ij,-ij}r}{mq} \alpha , R^{(-i)}=r-1, T_{ij}<0 \right\} \bigg ) \\&=\sum _{r=1}^m\frac{1}{mq}\sum _{i=1}^m\sum _{j=1}^q E_{\mathbf{P}^{(-ij)}}\widehat{w}_{ij,-ij} \alpha \big [ I(\mu _{ij}=0)+\frac{1}{2} I(\mu _{ij}>0)\big ]\\&\quad Pr(R^{(-i)}=r-1|\mathbf{P}^{(-ij)}) \\ {}&=\sum _{j=1}^q\sum _{i=1}^m E_{\mathbf{P}^{(-ij)}}\widehat{w}_{ij,-ij}\frac{1}{2mq} \big [ I(\mu _{ij}=0)+ 1\big ]\alpha \le \alpha , \end{aligned} \end{aligned}$$

where $\mathbf{P}^{(-ij)}$ is the collection of $P_{i^{\prime }j^{\prime }}, i^{\prime }=1,\ldots ,m, j^{\prime }=1,\ldots ,q,$ excluding $P_{ij}$.

The proof is completed. $\square $

Proof of Theorem 3

Without loss of generality, we assume $H_{ij}, i=1,\ldots ,m_{0j},$ to be true for each j. Note that, for $i=1,\ldots ,m_{0j},$

$$\begin{aligned} \begin{aligned} \sum _{i^{\prime }=1,i^{\prime }\ne i}^{m_{0j}}I(P_{i^{\prime }j}> \lambda )+1\sim Bin(m_{0j}-1,1-\lambda ) \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \frac{(m+m_{0j})(1-\lambda )}{m(1-\lambda )+(1-\lambda )\widehat{\pi }_{0j,-ij}}\le \frac{(m+m_{0j})}{m+(1-\lambda )\widehat{\pi }_{0j,-ij}}, \end{aligned} \end{aligned}$$

where Bin(a, b) is the binomial distribution with the parameter (a, b). Suppose $m_{0j}>0$, then we have

$$\begin{aligned} \begin{aligned}&\frac{1}{2}\sum _{i,j}(1+I(\mu _{ij}=0))E_{\mathbf{P}^{(-ij)}}\widehat{w}^a_{ij,-ij}\le E_{\mathbf{P}^{(-1j)}}\frac{m+m_{0j}}{1+\widehat{\pi }_{0j,-1j}}\\&\quad \le E_{(\mathbf{P}^{(-1j)},DU)}\frac{m+m_{0j}}{1+\widehat{\pi }_{0j,-1j}}\\&\quad \le m E_{(\mathbf{P}^{(-1j)},DU)}\frac{(m+m_{0j})(1-\lambda )}{m(1-\lambda )+(1-\lambda )m\widehat{\pi }_{0j,-1j}}\\&\quad \le m E_{(\mathbf{P}^{(-1j)},DU)}\frac{m+m_{0j}}{m+(1-\lambda )m\widehat{\pi }_{0j,-1j}}\\&\quad \le m E_{(\mathbf{P}^{(-1j)},DU)}\frac{m_{0j}}{(1-\lambda )m\widehat{\pi }_{0j,-1j}}\le m, \end{aligned} \end{aligned}$$

(7.1)

where the subscript DU of the expectation means that it is calculated under the Dirac-uniform configuration, which assumes that the p values corresponding to the false null hypotheses are 0 and the p values corresponding to the true null hypotheses are i.i.d as U(0, 1). The last inequality in Equation (7.1) is true by Sarkar et al. (2012) and the last inequality but one follows from $m_{0j}\ge (1-\lambda )m\widehat{\pi }_{0j,-1j}$ under the Dirac-uniform configuration. Thus, $\widehat{w}^a_{ij}$ satisfy $\sum _{i,j}(1+I(\mu _{ij}=0))E\widehat{w}^a_{ij,-ij}\le 2mq$. Obviously, $\widehat{w}^a_{ij}\le \widehat{w}^a_{ij,-ij}$. By Theorem 2, the data-driven Apro1 can control the mdFDR. The proof is completed. $\square $

Proof of Theorem 5

Without loss of generality, we suppose the non-zero components of ${\varvec{\mu }}_i$ are all positive. Then

$$\begin{aligned} \begin{aligned} mdFDR_{GW}&\le \sum _{r=1}^m\sum _{j=1}^q\frac{1}{r}\sum _{i=1}^m\bigg ( I(\mu _{ij}=0)Pr \{P_{ij}\le \frac{r\widehat{w}_{ij}}{mq} \alpha , R^{(-i)}=r-1 \}\\&\quad +I(\mu _{ij}>0)Pr\{P_{ij}\le \frac{r\widehat{w}_{ij}}{mq} \alpha , R^{(-i)}=r-1, T_{ij}<0 \}\bigg )\\&\le \sum _{r=1}^m\sum _{j=1}^q\frac{1}{r}\sum _{i=1}^m\bigg ( I(\mu _{ij}=0)Pr\{P_{ij} \le \frac{\widehat{w}_{ij,i}r}{mq} \alpha , R^{(-i)}=r-1 \}\\&\quad +I(\mu _{ij}>0)Pr\{P_{ij}\le \frac{\widehat{w}_{ij,i}r}{mq} \alpha , R^{(-i)}=r-1, T_{ij}<0 \}\bigg ), \end{aligned} \end{aligned}$$

where $\widehat{w}_{ij,i}=\sup _{0\le P_{ij}\le 1,j=1,\ldots ,q}\widehat{w}_{ij}$, and $R^{(-i)}$ is the number of rejections not including the rejection of $H_i$ by the first step of the GW procedure with $p_{ij}$ replaced by 0. Then

$$\begin{aligned} \begin{aligned} mdFDR_{GW}&\le \sum _{r=1}^m\sum _{j=1}^q\sum _{i=1}^m E_{\mathbf{P}^{(-ij)}} \frac{\widehat{w}_{ij,i}\alpha }{m}\big [ I(\mu _{ij}=0)\\&\quad +\frac{1}{2} I(\mu _{ij}>0) \big ]Pr(R^{(-i)}=r-1|\mathbf{P}^{(-ij)}) \\ {}&=\sum _{r=1}^m\sum _{j=1}^q\sum _{i=1}^m E_{\mathbf{P}^{(-ij)}}\frac{(w_{ij}+o_p(1)) \alpha }{m}\big [ I(\mu _{ij}=0)+\frac{1}{2} I(\mu _{ij}>0)\big ]\\ {}&\ \ \times Pr(R^{(-i)}=r-1|\mathbf{P}^{(-ij)}) \\ {}&=\frac{1}{2mq}\sum _{i,j}(1+I(\mu _{ij}=0)) w_{ij}\alpha +o(1)\le \alpha +o(1). \end{aligned} \end{aligned}$$

The proof is completed. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, H., Fung, W.K. Controlling mixed directional false discovery rate in multidimensional decisions with applications to microarray studies. TEST 27, 316–337 (2018). https://doi.org/10.1007/s11749-017-0547-1

Download citation

Received: 02 October 2016
Accepted: 15 June 2017
Published: 23 June 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s11749-017-0547-1

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Controlling mixed directional false discovery rate in multidimensional decisions with applications to microarray studies

Abstract

Access this article

Similar content being viewed by others

Adjustment for Multiplicity

Methods of Analysis and Meta-Analysis for Identifying Differentially Expressed Genes

Fully moderated t-statistic in linear modeling of mixed effects for differential expression analysis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 5

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Controlling mixed directional false discovery rate in multidimensional decisions with applications to microarray studies

Abstract

Access this article

Similar content being viewed by others

Adjustment for Multiplicity

Methods of Analysis and Meta-Analysis for Identifying Differentially Expressed Genes

Fully moderated t-statistic in linear modeling of mixed effects for differential expression analysis

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 5

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation