Abstract
Hidden Markov models (HMMs) are effective tools to detect series of statistically homogeneous structures, but they are not well suited to analyse complex structures. For example, the duration of stay in a state of a HMM must follow a geometric law. Numerous other methodological difficulties are encountered when using HMMs to segregate genes from transposons or retroviruses, or to determine the isochore classes of genes. The aim of this paper is to analyse these methodological difficulties, and to suggest new tools for the exploration of genome data. We show that HMMs can be used to analyse complex gene structures with bell-shaped length distribution by using convolution of geometric distributions. Thus, we have introduced macros-states to model the distributions of the lengths of the regions. Our study shows that simple HMM could be used to model the isochore organisation of the mouse genome. This potential use of markovian models to help in data exploration has been underestimated until now.
Similar content being viewed by others
References
Berget S.M. (1995). Exon recognition in vertebrate splicing. J. Biol. Chem. 270(6): 2411–2414
Bernaola-Galvan, P., Carpena, P., Roman-Roldon, R., Oliver, J.L.: Mapping isochores by entropic segmentation of long genome sequences. In: Sankoff, D., Lengauer, T. (eds.) RECOMB Proceedings of the fifth annual international conference on computational biology, pp. 217–218 (2001)
Bernardi G., Olofsson B., Filipski J., Zerial M., Salinas J., Cuny G., Meunier-Rotival M. and Rodier F. (1985). The mosaic genome of warm-blooded vertabrates. Science 228(4702): 953–958
Bernardi G. (2000). Isochores and the evolutionary genomics of vertebrates. review. Gene 241(1): 3–17
Borodovsky M. and McIninch J. (1993). Recognition of genes in DNA sequences with ambiguities. Biosystems 30(1–3): 161–171
Burge C. and Karlin S. (1997). Prediction of complete gene structure in human genomic DNA. J. Mol. Biol. 268: 78–94
Burge C. and Karlin S. (1998). Finding the genes in genomic DNA. Curr.Opin.Struc.Biol. 8: 346–354
Chen C., Gentles A.J., Jurka J. and Karlin S. (2002). Genes, pseudogenes, and Alu sequence organization across human chromosomes 21 and 22. PNAS 9: 2930–3935
Clay O., Caccio S., Zoubak S., Mouchiroud D. and Bernardi G. (1996). Human coding and non coding DNA: compositional correlations. Mol. Phyl. Evol. 1: 2–12
De Sario A., Geigl E.M., Palmieri G., D’Urso M. and Bernardi G. (1996). A compositional map of human chromosome band Xq28. Proc. Natl. Acad. Sci. USA 93(3): 1298–1302
D’Onofrio G., Mouchiroud D., Aïssani B., Gautier C. and Bernardi B. (1991). Correlations between the compositional properties of human genes, codon usage, and amino acid composition of proteins. J. Mol. Evol. 32: 504–510
Durbin R., Eddy S.R., Krogh A. and Mitchison G.J. (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge
Eyre-Walker A. and Hurst L.D. (2001). The evolution of isochores. Nat. Rev. Genet. Rev. 2(7): 549–555
Guédon Y. (2003). Estimating hidden semi-Markov chains from discrete sequences. J. Comput. Graph. Stat. 12(3): 604–639
Guéguen L. (2005). Sarment: python modules for HMM analysis and partitioning of sequences. Bioinformatics 21(16): 3427–34278
Hawkins J.D. (1998). A survey on intron and exon lengths. Nucleic Acids Res. 16: 9893–9908
Henderson J., Salzberg S. and Fasman K.H. (1997). Finding genes in DNA with a hidden Markov model. J. Comput. Biol. 4: 127–141
Jabbari K. and Bernardi G. (1998). CpG doublets, CpG islands and Alu repeats in long human DNA sequences from different isochore families. Gene 224(1–2): 123–127
Johnson M.T. (2005). Capacity and complexity of HMM duration modeling techniques. IEEE Process. Lett. 12(5): 407–410
Krogh, A.: Two methods for improving performance of an HMM and their application for gene-finding. In: Proceedings of the fifth international conference on intelligent systems for molecular biology 179–186 (1997)
Li W., Bernaola-Galvan P., Carpena P. and Oliver J.L. (2003). Isochores merit the prefix ‘iso’. Comput. Biol. Chem. 27(1): 5–10
Lukashin V.A. and Borodovsky M. (1998). Gene-Mark.hmm: new solutions for gene finding. Nucleic Acids Res. 26: 1107–1115
Macaya G., Thiery J.P. and Bernardi G. (1976). An approach to the organization of eukaryotic genomes at a macromolecular level. J. Mol. Biol. 108(1): 237–254
Mouchiroud D., D’Onofrio G., Aissani B., Macaya G., Gautier C. and Bernardi G. (1991). The distribution of genes in the human genome. Gene 100: 181–187
Nekrutenko A. and Li W.H. (2000). Assessment of compositional heterogeneity within and between eukaryotic genomes. Genome Res. 10(12): 1986–1995
Oliver J.L., Carpena P., Roman-Roldan R., Mata-Balaguer T., Mejias-Romero A., Hackenberg M. and Bernaola-Galvan P. (2002). Isochore chromosome maps of the human genome. Gene 300(1–2): 117–127
Oliver J.L., Carpena P., Hackenberg M., Bernaola-Galvan P. (2004) Isofinder: Computational prediction of isochores in genome sequences. Nucleic Acids Res. 32(1), 287–292 (2004)
Rabiner L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2): 257–286
Rogic S., Mackworth A.K. and Ouellette F.B. (2001). Evaluation of gene-finding programs on mammalian sequences. Genome Res. 11: 817–832
Thiery J.P., Macaya G. and Bernardi G. (1976). An analysis of eukaryotic genomes by density gradient centrifugation. J. Mol. Biol. 108(1): 219–235
Zhang C.T. and Zhang R. (2003). An isochore map of the human genome based on the Z curve method. Gene 317(1–2): 127–135
Zoubak S., Clay O. and Bernardi G. (1996). The gene distribution of the human genome. Gene 174(1): 95–102
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Melodelima, C., Gautier, C. & Piau, D. A markovian approach for the prediction of mouse isochores. J. Math. Biol. 55, 353–364 (2007). https://doi.org/10.1007/s00285-007-0087-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00285-007-0087-5