On the Use of Statistics in Genomics and Bioinformatics

Ewens, Warren J.

doi:10.1080/15598608.2008.10411868

On the Use of Statistics in Genomics and Bioinformatics

Published: 01 June 2008

Volume 2, pages 159–172, (2008)
Cite this article

Journal of Statistical Theory and Practice Aims and scope Submit manuscript

Warren J. Ewens¹

7 Accesses
Explore all metrics

Abstract

The human genome project and other genome projects provide us with rich sources of data which invite many new forms of statistical analysis. The nature of the data is often different from that in many other areas of science. This has led to novel forms of data analysis, not to be found in the classical statistical literature. The purpose of this paper is to describe some of these new forms, with a focus on those cases where the biology drives the questions asked, and the statistical analysis presents new features as well as raising further challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Benjamini, Y., Hochberg, Y., 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Proceedings of the Royal Statistical Society, Series B, 57, 289–300.
MathSciNet MATH Google Scholar
Benjamini, Y., Yekultieli, D., 2001. The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165–1188.
Article MathSciNet Google Scholar
Dayhoff, M. O., Schwartz, R. M., Orcutt, B. C., 1978. A model of evolutionary change in proteins. In Atlas of Protein Sequence Structure 5, Supplement 3.
Feller, W., 1968. An Introduction to Probability Theory and its Applications, Vol. 1, 3rd edition, Wiley, New York.
Jensen, S. T., Liu, J. S., 2004. BioOptimizer: A Bayesian scoring function approach to motif discovery. Bioinformatics, 20, 1557–1563.
Article Google Scholar
Jensen, S. T., Liu, X. S., Zhou, Q., Liu, J. S., 2004. Computational discovery of gene regulatory binding motifs: a Bayesian perspective. Statistical Science, 19, 188–204.
Article MathSciNet Google Scholar
Jukes, T. H., Cantor, C. R., 1969. Evolution of protein molecules. In Munro, H.N. (ed.), Mammalian Protein Metabolism, Academic Press, New York.
Google Scholar
Karlin, S., Altschul, S. F., 1990. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Science, 87, 3364–3368.
Article Google Scholar
Karlin, S., Altschul, S. F., 1993. Applications and statistics for multiple high-scoring segments in molecular sequences. Proceedings of the National Academy of Science, 90, 5873–5877.
Article Google Scholar
Karlin, S., Dembo, A., 1992. Limit distributions of maximal segmental scores among Markov-dependent partial sums. Advances in Applied Probability, 24, 113–140.
Article MathSciNet Google Scholar
Karlin, S., Macken, C., 1991a. Assessment of inhomogeneities in an E. Coli physical map. Necleic Acids Research, 19, 4241–4246.
Article Google Scholar
Karlin, S., Macken, C., 1991b. Some statistical problems in the assessment of inhomogeneities in DNA sequence data. Journal of the American Statistical Association, 86, 27–35.
Article Google Scholar
Kimura, M., 1980. A simple method for estimating evolutionary rate in a finite population due to mutational production of neutral and nearly neutral base substitution through comparative studies of nucleotide sequences. Journal of Molecular Biology, 16, 111–120.
Google Scholar
Robin, S., 2002. A compound Poisson model for word occurrences in DNA sequences. Journal of the Royal Statistical Society, Series C, 51, 1–15.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biology, University of Pennsylvania, Philadelphia, PA, 19104, USA
Warren J. Ewens

Authors

Warren J. Ewens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Warren J. Ewens.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ewens, W.J. On the Use of Statistics in Genomics and Bioinformatics. J Stat Theory Pract 2, 159–172 (2008). https://doi.org/10.1080/15598608.2008.10411868

Download citation

Received: 11 May 2007
Revised: 12 July 2007
Published: 01 June 2008
Issue Date: June 2008
DOI: https://doi.org/10.1080/15598608.2008.10411868

AMS Subject Classification

Key-words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Use of Statistics in Genomics and Bioinformatics

Abstract

Access this article

Similar content being viewed by others

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Forest construction of Gaussian and discrete variables with the application of Watanabe Bayesian Information Criterion

Introduction to Bioinformatics

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

AMS Subject Classification

Key-words

Navigation

On the Use of Statistics in Genomics and Bioinformatics

Abstract

Access this article

Similar content being viewed by others

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Forest construction of Gaussian and discrete variables with the application of Watanabe Bayesian Information Criterion

Introduction to Bioinformatics

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

AMS Subject Classification

Key-words

Search

Navigation