Abstract
A common procedure for estimating the number of genes that are differentially expressed (DE) in two experiments involves two steps. In the first step, data from the two experiments are separately analyzed to produce a list of genes declared to be DE in each experiment. Usually, each list is produced using a method that attempts to control the false discovery rate (FDR) in each experiment at some desired level α. In the second step, the number of genes common to both lists is used as an estimate of the number of genes DE in both experiments. A problem with this approach is that the resulting estimates can vary greatly with α, and the value of α that produces the best estimate for any given pair of experiments is difficult to predict. We propose a method that uses the p-values from both experiments simultaneously to produce one estimate—which does not depend on FDR level α—for the number of genes that are DE in both experiments. We use two simulation studies (one involving independent, normally distributed data and one involving microarray data) to compare the performances of our proposed method, the commonly used method, and another method proposed in literature to test for consistency of replicate experiments. The results of the simulation studies demonstrate the advantages of our approach. We conclude the article by estimating the number of genes that are DE in both of two experiments involving gene expressions in maize leaves.
Similar content being viewed by others
References
Akopyants, N. S., Matlib, R. S., Bukanova, E. N., Smeds, M. R., Brownstein, B. H., Stormo, G. D., and Beverley, S. M. (2004), “Expression Profiling Using Random Genomic DNA Microarrays Identifies Differentially Expressed Genes Associated With Three Major Developmental Stages of the Protozoan Parasite Leishmania Major,” Molecular and Biochemical Parasitology, 136, 71–86.
Buchanan-Wollaston, V., Page, T., Harrison, E., Breeze, E., Ok Lim, P., Gil Nam, H., Lin, J.-F., Wu, S.-H., Swidzinski, J., Ishizzaki, K., and Leaver, C. J. (2005), “Comparative Transcriptome Analysis Reveals Significant Differences in Gene Expression and Signalling Pathways Between Developmental and Dark/Starvation-Induced Senescence in Arabidopsis,” The Plant Journal, 42, 567–585.
Covshoff, S., Majeran, W., Liu, P., Kolkman, J. M., van Wilk, K. J., and Brutnell, T. P. (2008), “Deregulation of Maize C4 Photosynthetic Development in a Mesophyll Cell-Defective Mutant,” Plant Physiology, 146, 1469–1481.
Edgar, R., Domrachev, M., and Lash, A. E. (2002), “Gene Expression Omnibus: NCBI Gene Expression and Hybridization Array Repository,” Nucleic Acids Research, 30, 207–210.
Genovese, C. R., and Wasserman, L. (2004), “A Stochastic Process Approach to False Discovery Control,” Annals of Statistics, 32, 1035–1061.
Hannenhalli, S., Putt, M. E., Gilmore, J. M., Wang, J., Parmacek, M. S., Epstein, J. A., Morrisey, E. E., Marguilies, K. B., and Cappola, T. P. (2006), “Transcriptional Genomics Associates FOX Transcription Factors With Human Heart Failure,” Circulation, 114, 1269–1276.
Ianculescu, I., Wu, D.-Y., Siegmund, K. D., and Stallcup, M. R. (2012), “Selective Roles for cAMP Response Element-Binding Protein Binding Protein and p300 Protein as Coregulators for Androgen-Regulated Gene Expression in Advanced Prostate Cancer Cells,” The Journal of Biological Chemistry, 287, 4000–4013.
Lai, Y., Adam, B., Podolsky, R., and She, J.-X. (2007), “A Mixture Model Approach to the Tests of Concordance and Discordance Between Two Large-Scale Experiments With Two-Sample Groups,” Bioinformatics, 23, 1243–1250.
Langaas, M., Ferkingstad, E., and Lindqvist, B. H. (2005), “Estimating the Proportion of True Null Hypotheses, With Application to DNA Microarray Data,” Journal of the Royal Statistical Society, Series B, 67, 555–572.
Liang, K., and Nettleton, D. (2012), “Adaptive and Dynamic Adaptive Procedures for False Discovery Rate Control and Estimation,” Journal of the Royal Statistical Society, Series B, 74, 163–182.
Metzeler, K. H., Hummel, M., Bloomfield, C. D., Spiekermann, K., Braess, J., Sauerland, M.-C., Heinecke, A., Radmacher, M., Marcucci, G., Whitman, S. P., Maharry, K., Paxchka, P., Larson, R. A., Berdel, W. E., Buchner, T., Wormann, B., Mansmann, U., Hiddemann, W., Bohlander, S. K., and Buske, C. (2008), “An 86-Probe-Set Gene-Expression Signature Predicts Survival in Cytogenetically Normal Acute Myeloid Leukemia,” Blood, 112, 4193–4201.
Mosig, M., Lipkin, E., Khutoreskaya, G., Tchourzyna, E., Soller, M., and Friedmann, A. (2001), “A Whole Genome Scan for Quantitative Trait Loci Affecting Milk Protein Percentage in Israeli-Holstein Cattle, by Means of Selective Milk DNA Pooling in a Daughter Design, Using an Adjusted False Discovery Rate Criterion,” Genetics, 157, 1683–1698.
Nettleton, D., Hwang, J., Caldo, R., and Wise, R. (2006), “Estimating the Number of True Null Hypotheses From a Histogram of p Values,” Journal of Agricultural, Biological, and Environmental Statistics, 11, 337–356.
Smyth, G. K. (2004), “Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments,” Statistical Applications in Genetics and Molecular Biology, 3, 3.
Storey, J. D. (2002), “A Direct Approach to False Discovery Rates,” Journal of the Royal Statistical Society, Series B, 64, 479–498.
Storey, J. D., Taylor, J., and Siegmund, D. (2004), “Strong Control, Conservative Point Estimation and Simultaneous Conservative Consistency of False Discovery Rates: A Unified Approach,” Journal of the Royal Statistical Society, Series B, 66, 187–205.
Storey, J. D., and Tibshirani, R. (2003), “Statistical Significance for Genomewide Studies,” Proceedings of the National Academy of Sciences of the United States of America, 100, 9440–9445.
Voineagu, I., Wang, X., Johnston, P., Lowe, J. K., Tian, Y., Horvath, S., Mill, J., Cantor, R. M., Blencowe, B. J., and Geshwind, D. H. (2011), “Transcriptomic Analysis of Autistic Brain Reveals Convergent Molecular Pathology,” Nature, 474, 380–386.
Wang, J., Coombes, K. R., Highsmith, W. E., Keating, M. J., and Abruzzo, L. V. (2004), “Differences in Gene Expression Between B-Cell Chronic Lymphocytic Leukemia and Normal B Cells: A Meta-Analysis of Three Microarray Studies,” Bioinformatics, 20, 3166–3178.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Orr, M., Liu, P. & Nettleton, D. Estimating the Number of Genes That Are Differentially Expressed in Both of Two Independent Experiments. JABES 17, 583–600 (2012). https://doi.org/10.1007/s13253-012-0108-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13253-012-0108-8