Abstract
Young et al. [14] showed that due to gene length bias the popular Fisher Exact Test should not be used to study the association between a group of differentially expressed (DE) genes and a specific Gene Ontology (GO) category. Instead they suggest a test where one conditions on the genes in the GO category and draws the pseudo DE expressed genes according to a length-dependent distribution. The same model was presented in a different context by Kazemian et al. who went on to offer a dynamic programming (DP) algorithm to exactly estimate the significance of the proposed test [8]. Here we point out that while valid, the test proposed by these authors is no longer symmetric as Fisher’s Exact Test is: one gets different answers if one conditions on the observed GO category than on the DE set. As an alternative we offer a symmetric generalization of Fisher’s Exact Test and provide efficient algorithms to evaluate its significance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agresti, A.: A survey of exact inference for contingency tables. Statistical Science 7, 131–153 (1992)
Butler, R.W.: Saddlepoint Approximations with Applications. University Press, Cambridge (2007)
Cleveland, W.S., Devlin, S.J.: Locally-weighted regression: An approach to regression analysis by local fitting. Journal of the American Statistical Association 83, 596–610 (1988)
The Gene Ontology Consortium: Gene ontology: tool for the unification of biology. Nature Genetics 25, 25–29 (2000)
Cowell, W.R. (ed.): Sources and Development of Mathematical Software. Prentice-Hall Series in Computational Mathematics, Cleve Moler, Advisor. Prentice-Hall, Upper Saddle River, NJ 07458, USA (1984)
Fisher, R.A.: Statistical methods for research workers. Oliver & Boyd, London, 14th ed. edition (1970)
Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: Open source scientific tools for Python (2001)
Kazemian, M., Zhu, Q., Halfon, M.S., Sinha, S.: Improved accuracy of supervised crm discovery with interpolated markov models and cross-species comparison. Nucleic Acids Research 39(22), 9463–9472 (2011)
Nieduszynski, C.A., Hiraga, S., Ak, P., Benham, C.J., Donaldson, A.D.: Oridb: a dna replication origin database. Nucleic. Acids Res. 35(Database issue), D40–D46 (2007)
R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2006). ISBN 3-900051-07-0
Scannell, D.R., Zill, O.A., Rokas, A., Payen, C., Dunham, M.J., Eisen, M.B., Rine, J., Johnston, M., Hittinger, C.T.: The awesome power of yeast evolutionary genetics: New genome sequences and strain resources for the saccharomyces sensu stricto genus. G3 (Bethesda) 1(1), 11–25 (2011)
Skovgaard, I.M.: Saddlepoint expansions for conditional distributions. J. Appl. Prob. 24, 875–87 (1987)
Wallenius, K.T.: Biased sampling: the non-central hypegeometric probability distribution. PhD thesis, Stanford University (1963)
Young, M.D., Wakefield, M.J., Smyth, G.K., Oshlack, A.: Gene ontology analysis for rna-seq: accounting for selection bias. Genome Biology 11(R14), 11 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Manescu, D., Keich, U. (2015). A Symmetric Length-Aware Enrichment Test. In: Przytycka, T. (eds) Research in Computational Molecular Biology. RECOMB 2015. Lecture Notes in Computer Science(), vol 9029. Springer, Cham. https://doi.org/10.1007/978-3-319-16706-0_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-16706-0_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16705-3
Online ISBN: 978-3-319-16706-0
eBook Packages: Computer ScienceComputer Science (R0)