Journal of Statistical Theory and Practice

, Volume 2, Issue 2, pp 159–172 | Cite as

On the Use of Statistics in Genomics and Bioinformatics

  • Warren J. EwensEmail author


The human genome project and other genome projects provide us with rich sources of data which invite many new forms of statistical analysis. The nature of the data is often different from that in many other areas of science. This has led to novel forms of data analysis, not to be found in the classical statistical literature. The purpose of this paper is to describe some of these new forms, with a focus on those cases where the biology drives the questions asked, and the statistical analysis presents new features as well as raising further challenges.


BLAST motifs microarrays and the multiple testing problem the false discovery rate concept evolutionary models 

AMS Subject Classification

62P12 60G70 60G60 60J20 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Benjamini, Y., Hochberg, Y., 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Proceedings of the Royal Statistical Society, Series B, 57, 289–300.MathSciNetzbMATHGoogle Scholar
  2. Benjamini, Y., Yekultieli, D., 2001. The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165–1188.MathSciNetCrossRefGoogle Scholar
  3. Dayhoff, M. O., Schwartz, R. M., Orcutt, B. C., 1978. A model of evolutionary change in proteins. In Atlas of Protein Sequence Structure 5, Supplement 3.Google Scholar
  4. Feller, W., 1968. An Introduction to Probability Theory and its Applications, Vol. 1, 3rd edition, Wiley, New York.Google Scholar
  5. Jensen, S. T., Liu, J. S., 2004. BioOptimizer: A Bayesian scoring function approach to motif discovery. Bioinformatics, 20, 1557–1563.CrossRefGoogle Scholar
  6. Jensen, S. T., Liu, X. S., Zhou, Q., Liu, J. S., 2004. Computational discovery of gene regulatory binding motifs: a Bayesian perspective. Statistical Science, 19, 188–204.MathSciNetCrossRefGoogle Scholar
  7. Jukes, T. H., Cantor, C. R., 1969. Evolution of protein molecules. In Munro, H.N. (ed.), Mammalian Protein Metabolism, Academic Press, New York.Google Scholar
  8. Karlin, S., Altschul, S. F., 1990. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Science, 87, 3364–3368.CrossRefGoogle Scholar
  9. Karlin, S., Altschul, S. F., 1993. Applications and statistics for multiple high-scoring segments in molecular sequences. Proceedings of the National Academy of Science, 90, 5873–5877.CrossRefGoogle Scholar
  10. Karlin, S., Dembo, A., 1992. Limit distributions of maximal segmental scores among Markov-dependent partial sums. Advances in Applied Probability, 24, 113–140.MathSciNetCrossRefGoogle Scholar
  11. Karlin, S., Macken, C., 1991a. Assessment of inhomogeneities in an E. Coli physical map. Necleic Acids Research, 19, 4241–4246.CrossRefGoogle Scholar
  12. Karlin, S., Macken, C., 1991b. Some statistical problems in the assessment of inhomogeneities in DNA sequence data. Journal of the American Statistical Association, 86, 27–35.CrossRefGoogle Scholar
  13. Kimura, M., 1980. A simple method for estimating evolutionary rate in a finite population due to mutational production of neutral and nearly neutral base substitution through comparative studies of nucleotide sequences. Journal of Molecular Biology, 16, 111–120.Google Scholar
  14. Robin, S., 2002. A compound Poisson model for word occurrences in DNA sequences. Journal of the Royal Statistical Society, Series C, 51, 1–15.MathSciNetCrossRefGoogle Scholar

Copyright information

© Grace Scientific Publishing 2008

Authors and Affiliations

  1. 1.Department of BiologyUniversity of PennsylvaniaPhiladelphiaUSA

Personalised recommendations