Segmentation with an Isochore Distribution

  • Miklós Csűrös
  • Ming-Te Cheng
  • Andreas Grimm
  • Amine Halawani
  • Perrine Landreau
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4175)


We introduce a novel generative probabilistic model for segmentation problems in molecular sequence analysis. All segmentations that satisfy given minimum segment length requirements are equally likely in the model. We show how segmentation-related problems can be solved with similar efficacy as in hidden Markov models. In particular, we show how the best segmentation, as well as posterior segment class probabilities in individual sequence positions can be computed in O(nC) time in case of C segment classes and a sequence of length n.


Posterior Probability Hide Markov Model Good Segmentation Generative Probabilistic Model Segment Class 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Karlin, S.: Statistical signals in bioinformatics. Proc. Nat’l Acad. Sci. USA 102, 13355–13362 (2005)CrossRefGoogle Scholar
  2. 2.
    Li, W., Bernaola-Galván, P., Haghighi, F., Grosse, I.: Applications of recursive segmentation to the analysis of DNA sequences. Comput. Chem. 26, 491–510 (2002)CrossRefGoogle Scholar
  3. 3.
    Mathé, C., Sagot, M.F., Schiex, T., Rouzé, P.: Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res. 30, 4103–4117 (2002)CrossRefGoogle Scholar
  4. 4.
    Rabiner, L.R.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc. IEEE 77, 257–286 (1989)CrossRefGoogle Scholar
  5. 5.
    Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Cambridge University Press, UK (1998)MATHCrossRefGoogle Scholar
  6. 6.
    Fu, Y.X., Curnow, R.N.: Maximum likelihood estimation of multiple change points. Biometrika 77, 563–573 (1990)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Csűrös, M.: Maximum-scoring segment sets. IEEE/ACM Trans. Comput. Biol. Bioinf. 1, 139–150 (2004)CrossRefGoogle Scholar
  8. 8.
    Bernardi, G., Olofsson, B., Filipski, J., Zerial, M., Salinas, J., Cuny, G., Meunier-Rotival, M., Rodier, F.: The mosaic genome of warmblooded vertebrates. Science 228, 953–958 (1985)CrossRefGoogle Scholar
  9. 9.
    Bernardi, G.: Misunderstandings about isochores: Part I. Gene. 276, 3–13 (2001)CrossRefGoogle Scholar
  10. 10.
    IHGSC: Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)Google Scholar
  11. 11.
    Cohen, N., Dagan, T., Stone, L., Graur, D.: GC composition of the human genome: in search of isochores. Mol. Biol. Evol. 22, 1260–1272 (2005)CrossRefGoogle Scholar
  12. 12.
    Clay, O., Bernardi, G.: How not to look for isochores: A reply to Cohen et al. Mol. Biol. Evol. 22, 2315–2317 (2005)CrossRefGoogle Scholar
  13. 13.
    Constantini, M., Clay, O., Auletta, F., Bernardi, G.: An isochore map of the human genome. Genome Res. 16, 536–541 (2006)CrossRefGoogle Scholar
  14. 14.
    Eyre-Walker, A., Hurst, L.D.: The evolution of isochores. Nat. Rev. Genet. 2, 549–555 (2001)CrossRefGoogle Scholar
  15. 15.
    Szpankowski, W., Ren, W., Szpankowski, L.: An optimal DNA segmentation based on the MDL principle. Int. J. Bioinformatics Research and Applications 1, 3–17 (2005)CrossRefGoogle Scholar
  16. 16.
    Barry, D., Hartigan, J.A.: Product partition models for change point problems. Ann. Statist. 20, 260–279 (1992)MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Auger, I.E., Lawrence, C.E.: Algorithms for the optimal identification of segment neighborhoods. Bull. Math. Biol. 51, 39–54 (1989)MATHMathSciNetGoogle Scholar
  18. 18.
    Rissanen, J.: A universal prior for integers and estimation by minimum description length. Ann. Statist. 11, 416–431 (1983)MATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Tarnas, C., Hughey, R.: Reduced space hidden markov model training. Bioinformatics 14, 401–406 (1998)CrossRefGoogle Scholar
  20. 20.
    Grimwood, J., et al.: The DNA sequence and biology of human chromosome 19. Nature 428, 529–535 (2004)CrossRefGoogle Scholar
  21. 21.
    Karolchik, D., Baertsch, R., Diekhans, M., Furey, T.S., Hinrichs, A., Lu, Y.T., Roskin, K.M., Schwartz, M., Sugnet, C.W., Thomas, D.J., Weber, R.J., Haussler, D., Kent, W.J.: The UCSC genome browser database. Nucleic Acids Res. 31, 51–54 (2003)CrossRefGoogle Scholar
  22. 22.
    Klein, R.J., Misulovin, Z., Eddy, S.R.: Noncoding RNA genes identified in AT-rich hyperthermophiles. Proc. Nat’l Acad. Sci. USA 99, 7542–7547 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Miklós Csűrös
    • 1
  • Ming-Te Cheng
    • 1
  • Andreas Grimm
    • 2
  • Amine Halawani
    • 1
  • Perrine Landreau
    • 3
  1. 1.Department of Computer Science and Operations ResearchUniversité de MontréalMontréalCanada
  2. 2.Lehr- und Forschungseinheit für BioinformatikLudwig-Maximilians-Universität MünchenMünchenGermany
  3. 3.Institut Scientifique Polytechnique GaliléeUniversité Paris XIIIVilletaneuseFrance

Personalised recommendations