Abstract
We analyse the sequential structure of human genomic DNA by hidden Markov models. We apply models of widely different design: conventional left-right constructs and models with a built-in periodic architecture. The models are trained on segments of DNA sequences extracted such that they cover complete internal exons flanked by introns, or splice sites flanked by coding and non-coding sequence. Together, models of donor site regions, acceptor site regions and flanked internal exons, show that exons — besides the reading frame — hold a specific periodic pattern. The pattern has the consensus: non-T(A/T)G and a minimal periodicity of roughly 10 nucleotides.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Jet Propulsion Laboratory, Caltech.
Department of Psychology, Stanford University.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Trifonov, E. N. 1989. The Multiple Codes of Nucleotide Sequences, Bull. Math. Biol. 51:417–432.
Drew, H. R. and Travers, A. A. 1985. DNA Bending and its Relation to Nucleosome Positioning, J. Mol. Biol. 186:773–790.
Trifonov, E. N. 1987. Translation framing code and frame-monitoring mechanism as suggested by the analysis of mRNA and 16S rRNA nucleotide sequences, J. Mol. Biol., 194:643–652.
Trifonov, E. N. and Sussman, J. L. 1980. The pitch of chromatin DNA is reflected in its nucleotide sequence, PNAS USA 77:3816–3820.
Brendel, V., Beckmann, J. S. and Trifonov, E. N. 1986. Linguistics of Nucleotide Sequences: Morphology and Comparison of Vocabularies, J. Mol. Struct. Dyn. 4:11–21.
Goodman, S. D. and Nash, H. A. 1989. Nature, 341:251–254.
Crothers, D. M. and Steitz, T. A. in Transcriptional Regulation eds. McKnight,S. L. and Yamamoto,K. R., 501–534 Cold Spring Harbor Laboratory Press, New York, 1992.
Haran, T. E., Kahn, J. D. and Crothers, D. M. 1994. Sequence Elements Responsible for DNA Curvature, J. Mol. Biol. 244:135–143.
Muyldermans, S. and Travers, A. A. 1994. DNA Sequence Organization in Chromatosomes, J. Mol. Biol., 235:855–870.
Senapathy, P. Shapiro, M. B., and Harris, N. L. 1990. Splice Junctions, Branch Point Sites, and Exons: Sequence Statistics, Identification and Applications to Genome Project. Patterns in Nucleic Acid Sequences, Academic Press, 252–278.
Nussinov, R. 1989. Strong patterns in homooligomer tracts occurrences in non-coding and in potential regulatory sites in eukaryotic genomes. J. Biomol. Struct. Dyn. 6:985–1000.
Engelbrecht, J., Knudsen, S. and Brunak S., 1992. G/C rich tract in 5’ end of human introns, J. Mol. Biol., 227:108–113.
Rumelhart, D. E., Durbin, R., Golden, R. and Chauvin, Y. 1994. Back-propagation: the Theory. In: Back-propagation: Theory, Architectures and Applications. Y. E. Chauvin and D. E. Rumelhart Editors, Chapter 1, Lawrence Erlbaum Associates, in press.
Lapedes, A., Barnes, C., Burks, C., Farber, R. and Sirotkin, K. Application of Neural Networks and Other Machine Learning Algorithms to DNA Sequence Analysis. In G. I. Bell and T. G. Marr, editors. The Proceedings of the Interface Between Computation Science and Nucleic Acid Sequencing Workshop. Proceedings of the Santa Fe Institute, volume VII, pages 157–182. Addison Wesley, Redwood City, CA, 1988.
Brunak, S., Engelbrecht, J. and Knudsen, S. 1991. Prediction of Human mRNA Donor and Acceptor Sites from the DNA Sequence. J. Mol. Biol., 220:49–65.
Uberbacher, E. C. and Mural, R. J. 1991. Locating Protein-Coding Regions in Human DNA Sequences by a Multiple Sensor-Neural Network Approach. PNAS USA, 88:11261–11265.
Snyder, E. E. and Stormo, G. D. 1993. Identification of Coding Regions in Genomic DNA Sequences: an Application of Dynamic Programming and Neural Networks. Nuc. Acids Res., 21:607–613.
Xu, Y., Einstein, J. R., Mural, R. J., Shah, M. and Uberbacher, E. C. 1994. An Improved System for Exon Recognition and Gene Modeling in Human DNA Sequences. Proceedings of Second International Conference on Intelligent Systems for Molecular Biology Stanford University., R. Altman and D. Brutlag and P. Karp and R. Lathrop and D. Searls Editors, AAAI Press, 376–383.
Searls, D. B. 1992. The Linguistics of DNA. American Scientist, 80:579–591.
Sakakibara, Y., Brown, M., Underwood, R. C., Mian, S. I. and Haussler, D. 1993. Stochastic Context-Free Grammars for Modeling RNA. Technical Report UCSC-CRL-93–16, University of California, Santa Cruz.
Churchill, G. A. 1989. Stochastic Models for Heterogeneous DNA Sequences. Bull. Math. Biol., 51:79–94.
Baldi, P., Chauvin, Y., Hunkapiller, T. and McClure, M. A. 1993. Hidden Markov Models in Molecular Biology: New Algorithms and Applications. Advances in Neural Information Processing Systems 5:747–754, Morgan Kaufmann Pub.
Baldi, P., Chauvin, Y., Hunkapiller, T. and McClure, M. A. 1994a. Hidden Markov Models of Biological Primary Sequence Information. PNAS USA, 91:1059–1063.
Baldi, P., Brunak, S., Chauvin, Y, Engelbrecht, J. and Krogh, A. 1994b. Hidden Markov Models of Human Genes. Advances in Neural Information Processing Systems 6:761–768, Morgan Kaufmann Pub.
Baldi, P. and Chauvin, Y. 1994b. Hidden Markov Models of the G-Protein Coupled Receptor Family. J. Comp. Biol., 1:311–335.
Baldi, P., Brunak, S., Chauvin, Y., Engelbrecht, J. and Krogh, A. 1994c. Hidden Markov Models of Human Genes. CalTech Technical Report. Division of Biology, Caltech.
Haussler, D., Krogh, A., Mian, I. S. and Sjölander, K. 1993. Protein Modeling using Hidden Markov Models: Analysis of Globins, Proceedings of the Hawaii International Conference on System Sciences, 1, IEEE Computer Society Press, Los Alamitos, CA, 792–802.
Krogh, A., Brown, M., Mian, I. S., Sjölander, K. and Haussier, D. 1994a. Hidden Markov Models in Computational Biology: Applications to Protein Modeling. J. Mol. Biol. 235:1501–1531.
Krogh, A., Mian, I. S. and Haussier, D. 1994b. A Hidden Markov Model that Finds Genes in E. coli DNA, Nuc. Acids Res., 22:4768–4778.
Levinson, S. E., Rabiner, L. R. and Sondhi, M. M. 1983. An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition. The Bell Syst. Tech. J., 62:1035–1074.
Rabiner, L. R. 1989. A Tutorial on Hidden Markor Models and Selected Applications in Speech Recognition. Proc. IEEE, 77.257–286.
Ball, F. G. and Rice, J. A. 1992. Stochastic Models for Ion Channels: Introduction and Bibliography. Mathematical Bioscience.
Baum, L. E. 1972. An Inequality and Associated Maximization Technique in Statistical Estimation for Probabilistic Functions of Markov Processes. Inequalities, 3:1–8.
Dempster, A. P., Laird, N. M. and Rubin, D. B. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Roy. Stat. Soc., B39:1–22.
Baldi, P. and Chauvin, Y. 1994a. Smooth On-Line Learning Algorithms for Hidden Markov Models. Nçural Comp., 6:305–316.
Creighton, T. E. 1993. Proteins, Structures and Molecular Properties, W. H. Freeman, New York.
Baldi, P., Btunak, S., Chauvin, Y., Engelbrecht, J. & Krogh, A. 1995. Periodic sequence patterns in human exons. In Proc. of the Third Int. Conf. on Intelligent Systems for Mol. Biol., (Rawlings, C., Clark, D., Altman, R., Hunter, L., Lengauer, T. & Wodak, S. eds.), pp. 30–38. AAAI Press, Menlo Park.
Zhurkin, V. B. 1983. Specific alignment of nucleosomes on DNA correlates with periodic distribution of purine-pyrimidine and pyrimidine-purine dimers, FEBS Lett. 158:293–297.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1997 Springer Science+Business Media New York
About this chapter
Cite this chapter
Baldi, P., Brunak, S., Chauvin, Y., Krogh, A. (1997). Hidden Markov Models for Human Genes. In: Suhai, S. (eds) Theoretical and Computational Methods in Genome Research. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-5903-0_2
Download citation
DOI: https://doi.org/10.1007/978-1-4615-5903-0_2
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-7708-5
Online ISBN: 978-1-4615-5903-0
eBook Packages: Springer Book Archive