Advertisement

A Method to Design Standard HMMs with Desired Length Distribution for Biological Sequence Analysis

  • Hongmei Zhu
  • Jiaxin Wang
  • Zehong Yang
  • Yixu Song
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4175)

Abstract

Motivation: Hidden Markov Models (HMMs) have been widely used for biological sequence analysis. When modeling a phenomenon where for instance the nucleotide distribution does not change for various length of DNA, there are two popular approaches to achieve a desired length distribution: explicit or implicit modeling. The implicit modeling requires an elaborately designed model structure. So far there is no general procedure available for designing such a model structure from the training data automatically.

Results: We present an iterative algorithm to design standard HMMs structure with length distribution from the training data. The basic idea behind this algorithm is to use multiple shifted negative binomial distributions to model empirical length distribution. The negative binomial distribution is obtained by an array of n states, each with the same transition probability to itself. We shift this negative binomial distribution by using a serial of states linearly connected before the binomial model.

Keywords

Hide Markov Model Length Distribution Negative Binomial Distribution Negative Binomial Model Biological Sequence Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis: probabilistic models of proteins and nucleic acids. Tsinghua University Press, Beijing (2002)Google Scholar
  2. 2.
    Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 257–286 (1989)CrossRefGoogle Scholar
  3. 3.
    Michael, T.J.: Capacity and complexity of hmm duration modeling techniques. IEEE signal processing letters 12(5), 407–410 (2005)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Burge, C.: Identification of genes in human genomic DNA. PhD thesis, CA: Stanford University (1997)Google Scholar
  5. 5.
    Krogh, A.: Two methods for improving performance of an hmm and their application for gene-finding. In: Proceedings of the 5th international Conference on Intelligent Systems for Molecular Biology, pp. 179–186. AAAI Press, Menlo Park, CA (1997)Google Scholar
  6. 6.
    Yuan, Q., Ouyang, S., Liu, J., Suh, B., Cheung, F., Sultana, R., Lee, D., Quackenbush, J., Buell, C.R.: The tigr rice genome annotation resource: Annotating the rice genome and creating resources for plant biologists. Nucleic Acids Research 31(1), 229–233 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Hongmei Zhu
    • 1
  • Jiaxin Wang
    • 1
  • Zehong Yang
    • 1
  • Yixu Song
    • 1
  1. 1.Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and TechnologyTsinghua UniversityBeijingChina

Personalised recommendations