Skip to main content

A Symmetric Length-Aware Enrichment Test

  • Conference paper
  • First Online:
Research in Computational Molecular Biology (RECOMB 2015)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9029))

  • 2820 Accesses

Abstract

Young et al. [14] showed that due to gene length bias the popular Fisher Exact Test should not be used to study the association between a group of differentially expressed (DE) genes and a specific Gene Ontology (GO) category. Instead they suggest a test where one conditions on the genes in the GO category and draws the pseudo DE expressed genes according to a length-dependent distribution. The same model was presented in a different context by Kazemian et al. who went on to offer a dynamic programming (DP) algorithm to exactly estimate the significance of the proposed test [8]. Here we point out that while valid, the test proposed by these authors is no longer symmetric as Fisher’s Exact Test is: one gets different answers if one conditions on the observed GO category than on the DE set. As an alternative we offer a symmetric generalization of Fisher’s Exact Test and provide efficient algorithms to evaluate its significance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agresti, A.: A survey of exact inference for contingency tables. Statistical Science 7, 131–153 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  2. Butler, R.W.: Saddlepoint Approximations with Applications. University Press, Cambridge (2007)

    Book  MATH  Google Scholar 

  3. Cleveland, W.S., Devlin, S.J.: Locally-weighted regression: An approach to regression analysis by local fitting. Journal of the American Statistical Association 83, 596–610 (1988)

    Article  MATH  Google Scholar 

  4. The Gene Ontology Consortium: Gene ontology: tool for the unification of biology. Nature Genetics 25, 25–29 (2000)

    Google Scholar 

  5. Cowell, W.R. (ed.): Sources and Development of Mathematical Software. Prentice-Hall Series in Computational Mathematics, Cleve Moler, Advisor. Prentice-Hall, Upper Saddle River, NJ 07458, USA (1984)

    Google Scholar 

  6. Fisher, R.A.: Statistical methods for research workers. Oliver & Boyd, London, 14th ed. edition (1970)

    Google Scholar 

  7. Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: Open source scientific tools for Python (2001)

    Google Scholar 

  8. Kazemian, M., Zhu, Q., Halfon, M.S., Sinha, S.: Improved accuracy of supervised crm discovery with interpolated markov models and cross-species comparison. Nucleic Acids Research 39(22), 9463–9472 (2011)

    Article  Google Scholar 

  9. Nieduszynski, C.A., Hiraga, S., Ak, P., Benham, C.J., Donaldson, A.D.: Oridb: a dna replication origin database. Nucleic. Acids Res. 35(Database issue), D40–D46 (2007)

    Google Scholar 

  10. R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2006). ISBN 3-900051-07-0

    Google Scholar 

  11. Scannell, D.R., Zill, O.A., Rokas, A., Payen, C., Dunham, M.J., Eisen, M.B., Rine, J., Johnston, M., Hittinger, C.T.: The awesome power of yeast evolutionary genetics: New genome sequences and strain resources for the saccharomyces sensu stricto genus. G3 (Bethesda) 1(1), 11–25 (2011)

    Article  Google Scholar 

  12. Skovgaard, I.M.: Saddlepoint expansions for conditional distributions. J. Appl. Prob. 24, 875–87 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  13. Wallenius, K.T.: Biased sampling: the non-central hypegeometric probability distribution. PhD thesis, Stanford University (1963)

    Google Scholar 

  14. Young, M.D., Wakefield, M.J., Smyth, G.K., Oshlack, A.: Gene ontology analysis for rna-seq: accounting for selection bias. Genome Biology 11(R14), 11 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Uri Keich .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Manescu, D., Keich, U. (2015). A Symmetric Length-Aware Enrichment Test. In: Przytycka, T. (eds) Research in Computational Molecular Biology. RECOMB 2015. Lecture Notes in Computer Science(), vol 9029. Springer, Cham. https://doi.org/10.1007/978-3-319-16706-0_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16706-0_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16705-3

  • Online ISBN: 978-3-319-16706-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics