Advertisement

Uniform Accuracy of the Maximum Likelihood Estimates for Probabilistic Models of Biological Sequences

  • Svetlana Ekisheva
  • Mark Borodovsky
Article
  • 80 Downloads

Abstract

Probabilistic models for biological sequences (DNA and proteins) have many useful applications in bioinformatics. Normally, the values of parameters of these models have to be estimated from empirical data. However, even for the most common estimates, the maximum likelihood (ML) estimates, properties have not been completely explored. Here we assess the uniform accuracy of the ML estimates for models of several types: the independence model, the Markov chain and the hidden Markov model (HMM). Particularly, we derive rates of decay of the maximum estimation error by employing the measure concentration as well as the Gaussian approximation, and compare these rates.

Keywords

Maximum likelihood estimate Asymptotic properties of estimates Hidden Markov model Concentration of measure 

AMS 2000 Subject Classifications

62M05 60J10 62F10 11L07 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Almagor H (1983) A Markov analysis of DNA sequences. J Theor Biol 104:633–645CrossRefGoogle Scholar
  2. Azuma K (1967) Weighted sums of certain dependent random variables. Tôhoku Math J 19:357–367zbMATHCrossRefMathSciNetGoogle Scholar
  3. Bickel PJ, Ritov Y, Rydén T (1998) Asymptotic normality of the maximum-likelihood estimator for general hidden Markov models. Ann Stat 26:1614–1635zbMATHCrossRefGoogle Scholar
  4. Billingsley P (1961) Statistical methods in Markov chains. Ann Math Stat 32:12–40CrossRefMathSciNetGoogle Scholar
  5. Borodovsky M, Ekisheva S (2006) Problems and solutions in biological sequence analysis. Cambridge University Press, CambridgezbMATHCrossRefGoogle Scholar
  6. Borodovsky MY, Sprizhitsky YA, Golovanov EI, Alexandrov AA (1986a) Statistical patterns in the primary structure of the functional regions of the Escherichia coli genome. I. Frequency characteristics. Mol Biol 20:826–833 (English translation)Google Scholar
  7. Borodovsky MY, Sprizhitsky YA, Golovanov EI, Alexandrov AA (1986b) Statistical patterns in the primary structure of the functional regions of the Escherichia coli genome. II. Nonuniform Markov models. Mol Biol 20:833–840 (English translation)Google Scholar
  8. Borodovsky M, McIninch J (1993) GeneMark: parallel gene recognition for both DNA strands. Comput Chem 17:123–133zbMATHCrossRefGoogle Scholar
  9. Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94CrossRefGoogle Scholar
  10. Chernoff H (1952) A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann Math Stat 23:493–509zbMATHCrossRefMathSciNetGoogle Scholar
  11. Churchill GA (1989) Stochastic models for heterogeneous DNA sequences. Bull Math Biol 51:79–94zbMATHMathSciNetGoogle Scholar
  12. Cox DR, Hinkley DV (1974) Theoretical statistics. Chapman and Hall, LondonzbMATHGoogle Scholar
  13. Dembo A, Zeitouni O (1998) Large deviations techniques and applications, 2nd edn. Springer, New YorkzbMATHGoogle Scholar
  14. Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, CambridgezbMATHCrossRefGoogle Scholar
  15. Ekisheva S, Borodovsky M (2006) Probabilistic models for biological sequences: selection and maximum likelihood estimation. Int J Bioinformatics Res Appl 2:305–324Google Scholar
  16. Feller W (1945) On the normal approximation to the binomial distribution. Ann Math Stat 16:319–329zbMATHCrossRefMathSciNetGoogle Scholar
  17. Fort G, Roberts GO (2005) Subgeometric ergodicity of strong Markov processes. Ann Appl Probab 15:1565–1589zbMATHCrossRefMathSciNetGoogle Scholar
  18. Gatlin LL (1972) Information theory and the living system. Columbia University Press, New YorkGoogle Scholar
  19. Glynn PW, Ormoneit D (2002) Hoeffding’s inequality for uniformly ergodic Markov chains. Stat Probab Lett 56:143–146zbMATHCrossRefMathSciNetGoogle Scholar
  20. Gudynas P (2000) Refinements of the central limit theorem for homogeneous Markov chains. In: Prokhorov YV, Statulevičius V (eds) Limit theorems of probability theory. Springer, Berlin, pp 167–183Google Scholar
  21. Karlin S, Burge C, Campbell AM (1992) Statistical analyses of counts and distributions of restriction sites in DNA sequences. Nucleic Acids Res 20:1363–1370CrossRefGoogle Scholar
  22. Karlin S, Macken C (1991) Assessment of inhomogeneities in an E.Coli physical map. Nucleic Acids Res 19:4241–4246CrossRefGoogle Scholar
  23. Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC (1993) Detecting subtle sequence signals signals: a Gibbs sampling strategy for multiple alignment. Science 262:208–214CrossRefGoogle Scholar
  24. Li WV, Shao Q-M (2002) A normal comparison inequality and its applications. Probab Theory Relat Fields 122:494–508zbMATHCrossRefMathSciNetGoogle Scholar
  25. McDiarmid C (1998) Concentration. In: Probabilistic methods for algorithmic discrete mathematics. Algorithms in combinatorics, vol 16. Springer, Berlin, pp 195–248Google Scholar
  26. Meyn SP, Tweedie RL (1993) Markov chains and stochastic stability. Springer, LondonzbMATHGoogle Scholar
  27. Mitrophanov AY (2005) Sensitivity and convergence of uniformly ergodic Markov chains. J Appl Probab 42:1003–1114zbMATHCrossRefMathSciNetGoogle Scholar
  28. Mitrophanov AY, Lomsadze A, Borodovsky M (2005) Sensitivity of hidden Markov models. J Appl Probab 42:632–642zbMATHCrossRefMathSciNetGoogle Scholar
  29. Montenegro R, Tetali P (2006) Mathematical aspects of mixing times in Markov chains. In: Sudan M (ed) Book in series foundations and trends in theoretical computer science, vol 1:3. NOW, BostonGoogle Scholar
  30. Nagaev SV (1965) Some limit theorems for large deviations. Theor Probab Appl 10:214–235zbMATHCrossRefMathSciNetGoogle Scholar
  31. Osipov LV (1967) Asymptotic expansion in the central limit theorem. Vestn Leningr Univ Ser I 19:45–62 (in Russian)MathSciNetGoogle Scholar
  32. Petrie T (1969) Probabilistic functions of finite state Markov chains. Ann Math Stat 40:97–115zbMATHCrossRefMathSciNetGoogle Scholar
  33. Roberts GO, Tweedie RL (1999) Bounds on regeneration times and convergence rates for Markov chains. Stoch Process their Appl 80:211–229zbMATHCrossRefMathSciNetGoogle Scholar
  34. Samson P-M (2000) Concentration of measure inequalities for Markov chains and Φ-mixing processes. Ann Probab 28:416–461zbMATHCrossRefMathSciNetGoogle Scholar
  35. Saulis L, Statulevičius VA (1991) Limit theorems for large deviations. Kluwer Academic, DordrechtzbMATHGoogle Scholar
  36. Saulis L, Statulevičius VA (2000) Limit theorems on large deviations. In: Prokhorov YV, Statulevičius V (eds) Limit theorems of probability theory. Springer, Berlin, pp 185–266Google Scholar
  37. Tavaré S, Song B (1989) Codon preference and primary sequence structure in protein coding regions. Bull Math Biol 51:95–115zbMATHMathSciNetGoogle Scholar
  38. Tuominen P, Tweedie RL (1994) Subgeometric rates of convergence of f-ergodic Markov chains. Adv Appl Probab 26:775–798zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of MathematicsSyktyvkar State UniversitySyktyvkarRussia
  2. 2.Wallace H. Coulter Department of Biomedical Engineering and Computational Science and Engineering DivisionGeorgia Institute of TechnologyAtlantaUSA

Personalised recommendations