Skip to main content
Log in

Controlling mixed directional false discovery rate in multidimensional decisions with applications to microarray studies

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

Time-course microarray experiments harvested samples at several time points. To reveal the dynamic gene expression changes over time, we need to identify the significant genes and detect the patterns of gene expressions, which may bring directional errors. Guo et al. (Biometrics 66(2):485–492, 2010) introduced a mixed directional false discovery rate (mdFDR) controlled procedure, which controls the sum of expected proportions of Type I and Type III errors among all rejections. In this paper, we develop weighted p value procedures for mdFDR control and give out some sufficient conditions to assure the (asymptotic) mdFDR control. Some weights and their estimators are illustrated to satisfy the sufficient conditions. The proposed weighted p value procedures are compared with the existing method by extensive simulations. Based on the proposed weighted p values procedure, we provide multiple CIs which control the false coverage-statement rate (FCR). We use the proposed methods to analyze the time-course microarray data studied in Lobenhofer et al. (Mol Endocrinol 16:1215–1229, 2002). Most of our findings are the same as those obtained by the existing method. In addition, we identify some other important genes, such as CDKN3 and NQO1.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Arbeitman M, Furlong E, Imam F, Johnson E, Null B, Baker B, Krasnow M, Scott M, Davis R, White K (2002) Gene expression during the life cycle of drosophila melanogaster. Science 297(5590):2270–2275

    Article  Google Scholar 

  • Asher G, Lotem J, Kama R, Sachs L, Shaul Y (2002) NQO1 stabilizes p53 through a distinct pathway. Proc Nat Acad Sci 99(1):3099–3104

    Article  Google Scholar 

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57(1):289–300

    MathSciNet  MATH  Google Scholar 

  • Benjamini Y, Hochberg Y (1997) Multiple hypotheses testing with weights. Scand J Stat 24(3):407–418

    Article  MathSciNet  MATH  Google Scholar 

  • Benjamini Y, Hochberg Y (2000) On the adaptive control of the false discovery rate in multiple testing with independent statistics. J Educ Behav Stat 25(1):60–83

    Article  Google Scholar 

  • Benjamini Y, Yekutieli D (2005) False discovery rate-adjusted multiple confidence intervals for selected parameters. J Am Stat Assoc 100:71–80

    Article  MathSciNet  MATH  Google Scholar 

  • Benjamini Y, Krieger A, Yekutieli D (2006) Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93(3):491–507

    Article  MathSciNet  MATH  Google Scholar 

  • Blanchard G, Roquain E (2009) Adaptive fdr control under independence and dependence. J Mach Learn 29:2837–2871

    MATH  Google Scholar 

  • Ernst J, Nau GJ, Bar-Joseph Z (2005) Clustering short time series gene expression data. Bioinformatics 21:i159–i168

    Article  Google Scholar 

  • Finner H, Gontscharuk V (2009) Controlling the familywise error rate with plug-in estimator for the proportion of true null hypotheses. J R Stat Soc B 71(5):1031–1048

    Article  MathSciNet  Google Scholar 

  • Genovese C, Wasserman L (2004) A stochastic process approach to false discovery control. Ann Stat 32(3):1035–1061

    Article  MathSciNet  MATH  Google Scholar 

  • Genovese C, Roeder K, Wasserman L (2006) False discovery control with \(p\) value weighting. Biometrika 93(3):509–524

    Article  MathSciNet  MATH  Google Scholar 

  • Gui J, Tosteson TD, Borsuk M (2012) Weighted multiple testing procedures for genomic studies. BioData Min 5(1):4–13

    Article  Google Scholar 

  • Guillemin K, Salama N, Tompkins L, Falkow S (2002) Cag pathogenicity island-specific responses of gastric epithelial cells to helicobacter pylori infection. Proc Natl Acad Sci USA 99:15136–15141

    Article  Google Scholar 

  • Guo W, Sarkar SK, Peddada SD (2010) Controlling false discoveries in multidimensional directional decisions, with applications to gene expression data on ordered categories. Biometrics 66(2):485–492

    Article  MathSciNet  MATH  Google Scholar 

  • Hu JX, Zhao H, Zhou H (2010) False discovery rate control with groups. J Am Stat Assoc 105(491):1215–1227

    Article  MathSciNet  MATH  Google Scholar 

  • Jin J, Cai T (2007) Estimating the null and the proportion of nonnull effects in large-scale multiple comparisons. J Am Stat Assoc 102:495–506

  • Lee S, Reimer CL, Fang L, Iruela-Arispe LM, Aaronson SA (2000) Overexpression of kinase-associated phosphatase (KAP) in breast and prostate cancer and inhibition of the transformed phenotype by antisense kap expression. Mol Cell Biol 20(5):1723–1732

    Article  Google Scholar 

  • Lobenhofer E, Bennett L, Cable P, Li L, Bushel P, Afshari C (2002) Regulation of DNA replication fork genes by 17 betaestradiol. Mol Endocrinol 16:1215–1229

    Google Scholar 

  • Meinshausen N, Rice J (2006) Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. Ann Stat 34(1):373–393

    Article  MathSciNet  MATH  Google Scholar 

  • Peddada SD, Lobenhofer E, Li L, Afshari C, Weinberg C, Umbach D (2003) Gene selection and clustering for time-course and dose response microarray experiments using order-restricted inference. Bioinformatics 19:834–841

    Article  Google Scholar 

  • Roeder K, Wasserman L (2009) Genome-wide significance levels and weighted hypothesis testing. Stat Sci 24(4):398–413

    Article  MathSciNet  MATH  Google Scholar 

  • Sarkar SK, Guo W, Finner H (2012) On adaptive procedures controlling the familywise error rate. J Stat Plan Inference 142(3):65–78

    Article  MathSciNet  MATH  Google Scholar 

  • Simes R (1986) An improved Bonferroni procedure for multiple test of significance. Biometrika 73(3):751–754

    Article  MathSciNet  MATH  Google Scholar 

  • Storey J (2002) A direct approach to false discovery rates. J R Stat Soc B 64(3):479–498

    Article  MathSciNet  MATH  Google Scholar 

  • Sun W, Cai T (2007) Oracle and adaptive compound decision rules for false discovery rate control. J Am Stat Assoc 102:901–912

    Article  MathSciNet  MATH  Google Scholar 

  • Tian B, Nowak D, Brasier A (2005) A TNF-Induced gene expression program under oscillatory NF-Kappab control. BMC Genom 6(73):137–137

    Article  Google Scholar 

  • Wang L, Ramoni M, Sebastiani P (2006) Clustering short gene expression profiles. Lect Notes Comput Sci 3909:60–68

    Article  MathSciNet  MATH  Google Scholar 

  • Wang L, Montano M, Rarick M, Sebastiani P (2008) Conditional clustering of temporal expression profiles. BMC Bioinform 9:147–147

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to two anonymous referees and an editor for constructive comments and suggestions, which have led to substantial improvement in the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haibing Zhao.

Additional information

Haibing Zhao’s work was supported by a Grant from the National Natural Science Foundation of China (NSFC) (No. 11471204).

Appendix

Appendix

Proof of Theorem 1

We show the conclusion to be true following Guo et al. (2010). Without loss of generality, we suppose the non-zero components of \({\varvec{\mu }}_i=(\mu _{i1},\ldots ,\mu _{iq})\) are all positive. Then

$$\begin{aligned} \begin{aligned} {\text{ m }dFDR}&\le \sum _{r=1}^m\sum _{j=1}^q\frac{1}{r}\sum _{i=1}^m\bigg ( I(\mu _{ij}=0)Pr\left\{ P_{ij}\le \frac{w_{ij}r}{mq} \alpha , R=r \right\} \\&\quad +I(\mu _{ij}>0)Pr\left\{ P_{ij}\le \frac{w_{ij}r}{mq} \alpha , R=r, T_{ij}<0 \right\} \bigg )\\&\le \sum _{r=1}^m\sum _{j=1}^q\frac{1}{r}\sum _{i=1}^m\bigg ( I(\mu _{ij}=0) +\frac{1}{2} I(\mu _{ij}>0)\bigg )Pr\left\{ R^{(-i)}=r-1 \right\} \frac{w_{ij}r}{mq} \alpha \\&=\sum _{j=1}^q\sum _{i=1}^m\bigg ( I(\mu _{ij}=0)+\frac{1}{2} I(\mu _{ij}>0)\bigg ) \frac{w_{ij}}{mq} \alpha =\alpha , \end{aligned} \end{aligned}$$

where \(I(\cdot )\) is the indicator function and \(R^{(-i)}\) is the number of rejections not including the rejection of \(H_i\) in the BH step-up procedure with \(P_{ij}\) replaced by 0. Note that \(P_{ij}\) replaced by 0 leads to \(P_{i}=0\). The proof is completed. \(\square \)

Proof of Theorem 2

Without loss of generality, we suppose the non-zero components of \({\varvec{\mu }}_i\) are all positive. Then

$$\begin{aligned} \begin{aligned} mdFDR&\le \sum _{r=1}^m\sum _{j=1}^q\frac{1}{r}\sum _{i=1}^m\bigg ( I(\mu _{ij}=0)Pr \{P_{ij}\le \frac{ \widehat{w}_{ij}r}{mq} \alpha , R=r \}\\&\quad +I(\mu _{ij}>0)Pr\left\{ P_{ij}\le \frac{ \widehat{w}_{ij}r}{mq} \alpha , R=r, T_{ij}<0 \right\} \bigg )\\&\le \sum _{r=1}^m\sum _{j=1}^q\frac{1}{r}\sum _{i=1}^m\bigg ( I(\mu _{ij}=0)Pr\{P_{ij} \le \frac{ \widehat{w}_{ij,-ij}r}{mq} \alpha , R^{(-i)}=r-1 \}\\&\quad +I(\mu _{ij}>0)Pr\left\{ P_{ij}\le \frac{ \widehat{w}_{ij,-ij}r}{mq} \alpha , R^{(-i)}=r-1, T_{ij}<0 \right\} \bigg ) \\&=\sum _{r=1}^m\frac{1}{mq}\sum _{i=1}^m\sum _{j=1}^q E_{\mathbf{P}^{(-ij)}}\widehat{w}_{ij,-ij} \alpha \big [ I(\mu _{ij}=0)+\frac{1}{2} I(\mu _{ij}>0)\big ]\\&\quad Pr(R^{(-i)}=r-1|\mathbf{P}^{(-ij)}) \\ {}&=\sum _{j=1}^q\sum _{i=1}^m E_{\mathbf{P}^{(-ij)}}\widehat{w}_{ij,-ij}\frac{1}{2mq} \big [ I(\mu _{ij}=0)+ 1\big ]\alpha \le \alpha , \end{aligned} \end{aligned}$$

where \(\mathbf{P}^{(-ij)}\) is the collection of \(P_{i^{\prime }j^{\prime }}, i^{\prime }=1,\ldots ,m, j^{\prime }=1,\ldots ,q,\) excluding \(P_{ij}\).

The proof is completed. \(\square \)

Proof of Theorem 3

Without loss of generality, we assume \(H_{ij}, i=1,\ldots ,m_{0j},\) to be true for each j. Note that, for \(i=1,\ldots ,m_{0j},\)

$$\begin{aligned} \begin{aligned} \sum _{i^{\prime }=1,i^{\prime }\ne i}^{m_{0j}}I(P_{i^{\prime }j}> \lambda )+1\sim Bin(m_{0j}-1,1-\lambda ) \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \frac{(m+m_{0j})(1-\lambda )}{m(1-\lambda )+(1-\lambda )\widehat{\pi }_{0j,-ij}}\le \frac{(m+m_{0j})}{m+(1-\lambda )\widehat{\pi }_{0j,-ij}}, \end{aligned} \end{aligned}$$

where Bin(ab) is the binomial distribution with the parameter (ab). Suppose \(m_{0j}>0\), then we have

$$\begin{aligned} \begin{aligned}&\frac{1}{2}\sum _{i,j}(1+I(\mu _{ij}=0))E_{\mathbf{P}^{(-ij)}}\widehat{w}^a_{ij,-ij}\le E_{\mathbf{P}^{(-1j)}}\frac{m+m_{0j}}{1+\widehat{\pi }_{0j,-1j}}\\&\quad \le E_{(\mathbf{P}^{(-1j)},DU)}\frac{m+m_{0j}}{1+\widehat{\pi }_{0j,-1j}}\\&\quad \le m E_{(\mathbf{P}^{(-1j)},DU)}\frac{(m+m_{0j})(1-\lambda )}{m(1-\lambda )+(1-\lambda )m\widehat{\pi }_{0j,-1j}}\\&\quad \le m E_{(\mathbf{P}^{(-1j)},DU)}\frac{m+m_{0j}}{m+(1-\lambda )m\widehat{\pi }_{0j,-1j}}\\&\quad \le m E_{(\mathbf{P}^{(-1j)},DU)}\frac{m_{0j}}{(1-\lambda )m\widehat{\pi }_{0j,-1j}}\le m, \end{aligned} \end{aligned}$$
(7.1)

where the subscript DU of the expectation means that it is calculated under the Dirac-uniform configuration, which assumes that the p values corresponding to the false null hypotheses are 0 and the p values corresponding to the true null hypotheses are i.i.d as U(0, 1). The last inequality in Equation (7.1) is true by Sarkar et al. (2012) and the last inequality but one follows from \(m_{0j}\ge (1-\lambda )m\widehat{\pi }_{0j,-1j}\) under the Dirac-uniform configuration. Thus, \(\widehat{w}^a_{ij}\) satisfy \(\sum _{i,j}(1+I(\mu _{ij}=0))E\widehat{w}^a_{ij,-ij}\le 2mq\). Obviously, \(\widehat{w}^a_{ij}\le \widehat{w}^a_{ij,-ij}\). By Theorem 2, the data-driven Apro1 can control the mdFDR. The proof is completed. \(\square \)

Proof of Theorem 5

Without loss of generality, we suppose the non-zero components of \({\varvec{\mu }}_i\) are all positive. Then

$$\begin{aligned} \begin{aligned} mdFDR_{GW}&\le \sum _{r=1}^m\sum _{j=1}^q\frac{1}{r}\sum _{i=1}^m\bigg ( I(\mu _{ij}=0)Pr \{P_{ij}\le \frac{r\widehat{w}_{ij}}{mq} \alpha , R^{(-i)}=r-1 \}\\&\quad +I(\mu _{ij}>0)Pr\{P_{ij}\le \frac{r\widehat{w}_{ij}}{mq} \alpha , R^{(-i)}=r-1, T_{ij}<0 \}\bigg )\\&\le \sum _{r=1}^m\sum _{j=1}^q\frac{1}{r}\sum _{i=1}^m\bigg ( I(\mu _{ij}=0)Pr\{P_{ij} \le \frac{\widehat{w}_{ij,i}r}{mq} \alpha , R^{(-i)}=r-1 \}\\&\quad +I(\mu _{ij}>0)Pr\{P_{ij}\le \frac{\widehat{w}_{ij,i}r}{mq} \alpha , R^{(-i)}=r-1, T_{ij}<0 \}\bigg ), \end{aligned} \end{aligned}$$

where \(\widehat{w}_{ij,i}=\sup _{0\le P_{ij}\le 1,j=1,\ldots ,q}\widehat{w}_{ij}\), and \(R^{(-i)}\) is the number of rejections not including the rejection of \(H_i\) by the first step of the GW procedure with \(p_{ij}\) replaced by 0. Then

$$\begin{aligned} \begin{aligned} mdFDR_{GW}&\le \sum _{r=1}^m\sum _{j=1}^q\sum _{i=1}^m E_{\mathbf{P}^{(-ij)}} \frac{\widehat{w}_{ij,i}\alpha }{m}\big [ I(\mu _{ij}=0)\\&\quad +\frac{1}{2} I(\mu _{ij}>0) \big ]Pr(R^{(-i)}=r-1|\mathbf{P}^{(-ij)}) \\ {}&=\sum _{r=1}^m\sum _{j=1}^q\sum _{i=1}^m E_{\mathbf{P}^{(-ij)}}\frac{(w_{ij}+o_p(1)) \alpha }{m}\big [ I(\mu _{ij}=0)+\frac{1}{2} I(\mu _{ij}>0)\big ]\\ {}&\ \ \times Pr(R^{(-i)}=r-1|\mathbf{P}^{(-ij)}) \\ {}&=\frac{1}{2mq}\sum _{i,j}(1+I(\mu _{ij}=0)) w_{ij}\alpha +o(1)\le \alpha +o(1). \end{aligned} \end{aligned}$$

The proof is completed. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, H., Fung, W.K. Controlling mixed directional false discovery rate in multidimensional decisions with applications to microarray studies. TEST 27, 316–337 (2018). https://doi.org/10.1007/s11749-017-0547-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-017-0547-1

Keywords

Mathematics Subject Classification

Navigation