Skip to main content
Log in

On the Use of Statistics in Genomics and Bioinformatics

  • Published:
Journal of Statistical Theory and Practice Aims and scope Submit manuscript

Abstract

The human genome project and other genome projects provide us with rich sources of data which invite many new forms of statistical analysis. The nature of the data is often different from that in many other areas of science. This has led to novel forms of data analysis, not to be found in the classical statistical literature. The purpose of this paper is to describe some of these new forms, with a focus on those cases where the biology drives the questions asked, and the statistical analysis presents new features as well as raising further challenges.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Benjamini, Y., Hochberg, Y., 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Proceedings of the Royal Statistical Society, Series B, 57, 289–300.

    MathSciNet  MATH  Google Scholar 

  • Benjamini, Y., Yekultieli, D., 2001. The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165–1188.

    Article  MathSciNet  Google Scholar 

  • Dayhoff, M. O., Schwartz, R. M., Orcutt, B. C., 1978. A model of evolutionary change in proteins. In Atlas of Protein Sequence Structure 5, Supplement 3.

  • Feller, W., 1968. An Introduction to Probability Theory and its Applications, Vol. 1, 3rd edition, Wiley, New York.

  • Jensen, S. T., Liu, J. S., 2004. BioOptimizer: A Bayesian scoring function approach to motif discovery. Bioinformatics, 20, 1557–1563.

    Article  Google Scholar 

  • Jensen, S. T., Liu, X. S., Zhou, Q., Liu, J. S., 2004. Computational discovery of gene regulatory binding motifs: a Bayesian perspective. Statistical Science, 19, 188–204.

    Article  MathSciNet  Google Scholar 

  • Jukes, T. H., Cantor, C. R., 1969. Evolution of protein molecules. In Munro, H.N. (ed.), Mammalian Protein Metabolism, Academic Press, New York.

    Google Scholar 

  • Karlin, S., Altschul, S. F., 1990. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Science, 87, 3364–3368.

    Article  Google Scholar 

  • Karlin, S., Altschul, S. F., 1993. Applications and statistics for multiple high-scoring segments in molecular sequences. Proceedings of the National Academy of Science, 90, 5873–5877.

    Article  Google Scholar 

  • Karlin, S., Dembo, A., 1992. Limit distributions of maximal segmental scores among Markov-dependent partial sums. Advances in Applied Probability, 24, 113–140.

    Article  MathSciNet  Google Scholar 

  • Karlin, S., Macken, C., 1991a. Assessment of inhomogeneities in an E. Coli physical map. Necleic Acids Research, 19, 4241–4246.

    Article  Google Scholar 

  • Karlin, S., Macken, C., 1991b. Some statistical problems in the assessment of inhomogeneities in DNA sequence data. Journal of the American Statistical Association, 86, 27–35.

    Article  Google Scholar 

  • Kimura, M., 1980. A simple method for estimating evolutionary rate in a finite population due to mutational production of neutral and nearly neutral base substitution through comparative studies of nucleotide sequences. Journal of Molecular Biology, 16, 111–120.

    Google Scholar 

  • Robin, S., 2002. A compound Poisson model for word occurrences in DNA sequences. Journal of the Royal Statistical Society, Series C, 51, 1–15.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Warren J. Ewens.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ewens, W.J. On the Use of Statistics in Genomics and Bioinformatics. J Stat Theory Pract 2, 159–172 (2008). https://doi.org/10.1080/15598608.2008.10411868

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1080/15598608.2008.10411868

AMS Subject Classification

Key-words

Navigation