Genomic Outlier Detection in High-Throughput Data Analysis

  • Debashis GhoshEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 972)


In the analysis of high-throughput data, a very common goal is the detection of genes or of differential expression between two groups or classes. A recent finding from the scientific literature in prostate cancer demonstrates that by searching for a different pattern of differential expression, new candidate oncogenes might be found. In this chapter, we discuss the statistical problem, termed oncogene outlier detection, and discuss a variety of proposals to this problem. A statistical model in the multiclass situation is described; links with multiple testing concepts are established. Some new nonparametric procedures are described and compared to existing methods using simulation studies.

Key words

cDNA microarrays Cancer Differential Expression Multiple Comparisons Rank-based statistic 



The author would like to acknowledge the support of the Huck Institutes of Life Sciences at Penn State University and NIH Grant R01GM72007.


  1. 1.
    Ludwig JA, Weinstein JN (2005) Biomarkers in cancer staging, prognosis and treatment. Nat Rev Cancer 11:845–856CrossRefGoogle Scholar
  2. 2.
    Ge Y, Dudoit S, Speed TP (2003) Resampling-based multiple testing for microarray data analysis. Test 12:1–44CrossRefGoogle Scholar
  3. 3.
    Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300Google Scholar
  4. 4.
    Gordon A, Glazko G, Qiu X, Yakovlev A (2007) Control of the mean number of false discoveries, Bonferroni and stability of multiple testing. Ann Appl Stat 1:179–190CrossRefGoogle Scholar
  5. 5.
    Tomlins SA, Rhodes DR, Perner S, Dhanasekaran SM et al (2005) Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 310:644–648CrossRefPubMedGoogle Scholar
  6. 6.
    Tibshirani R, Hastie T (2007) Outlier sums for differential gene expression analysis. Biostatistics 8:2–8CrossRefPubMedGoogle Scholar
  7. 7.
    Wu B (2007) Cancer outlier differential gene expression detection. Biostatistics 8:566–575CrossRefPubMedGoogle Scholar
  8. 8.
    Lian H (2008) MOST: detecting cancer differential gene expression. Biostatistics 9:411–818CrossRefPubMedGoogle Scholar
  9. 9.
    Xiao Y, Gordon A, Yakovlev A (2006) The L 1-version of the Cramer-von Mises test for two-sample comparisons in microarray data analysis. EURASIP J Bioinform Syst Biol 85769Google Scholar
  10. 10.
    Hanahan D, Weinberg RA (2000) The hallmarks of cancer. Cell 100:57–70CrossRefPubMedGoogle Scholar
  11. 11.
    Ghosh D, Chinnaiyan AM (2008) Genomic outlier prole analysis: mixture models, null hypotheses and nonparametric estimation. Biostatistics (Advance Access published on June 6, 2008). doi:10.1093/biostatistics/kxn015Google Scholar
  12. 12.
    Liu F, Wu B (2007) Multi-group cancer outlier differential gene expression detection. Comput Biol Chem 31:65–71CrossRefPubMedGoogle Scholar
  13. 13.
    Shaffer J (1995) Multiple hypothesis testing. Annu Rev Psychol 46:561–584CrossRefGoogle Scholar
  14. 14.
    Dudoit S, Yang YH, Callow MJ, Speed TP (2002) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica 12:111–140Google Scholar
  15. 15.
    Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100:9440–9445CrossRefPubMedGoogle Scholar
  16. 16.
    Lyons-Weiler J, Patel S, Becich MJ, Godfrey TE (2004) Tests for finding complex patterns of differential expression in cancers: towards individualized medicine. BMC Bioinform 125:110CrossRefGoogle Scholar
  17. 17.
    Benjamini Y, Heller R (2008) Screening for partial conjunction hypotheses. Biometrics (Published online February 6, 2008). doi:10.1111/j.1541-0420.2007.00983.xGoogle Scholar
  18. 18.
    Ploner A, Calza S, Gusnanto A, Pawitan Y (2006) Multidimensional local false discovery rate for microarray studies. Bioinformatics 22:556–565CrossRefPubMedGoogle Scholar
  19. 19.
    Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96:1151–1160CrossRefGoogle Scholar
  20. 20.
    Chi Z (2008) False discovery control with multivariate p-values. Electron J Stat 2:368–411CrossRefGoogle Scholar
  21. 21.
    Genovese CR, Wasserman L (2004) A stochastic process approach to false discovery control. Ann Stat 35:1035–1061Google Scholar
  22. 22.
    Benjamini Y, Yekutieli D (2001) False discovery control under dependency. Ann Stat 29:1165–1188CrossRefGoogle Scholar
  23. 23.
    Dettling M, Gabrielson E, Parmigiani G (2005) Searching for differentially expressed gene combinations. Genome Biol 6:R88CrossRefPubMedGoogle Scholar
  24. 24.
    Xiao Y, Frisina R, Gordon A, Klebanov L, Yakovlev A (2004) Multivariate search for differentially expressed gene combinations. BMC Bioinform 26:164CrossRefGoogle Scholar
  25. 25.
    MacDonald JW, Ghosh D (2006) COPA-cancer outlier prole analysis. Bioinformatics 22:2950–2951CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Departments of Statistics and Public Health SciencesPenn State UniversityDuBiosUSA

Personalised recommendations