Summary
This chapter aims at developing the computational theory for modeling patterns and their hierarchical coordination within biological sequences. With the exception of the promoters and enhancers, the functional significance of the non-coding DNA is not well understood. Scientists are now discovering that specific regions of non-coding DNA interact with the cellular machinery and help bring about the expression of genes. Our premise is that it is possible to study the arrangements of patterns in biological sequences through machine learning algorithms. As the biological database continue their exponential growth, it becomes feasible to apply in-silico Data Mining algorithms to discover interesting patterns of motif arrangements and the frequency of their re-iteration. A systematic procedure for achieving this goal is presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Berg, O. and Hippel, P. v., ”Selection of DNA binding sites by regulatory proteins,” J.Mol.Biol., Vol. 193, 1987, pp. 723-750.
Bode, J., Stengert-Iber, M., Kay, V., Schlake, T., and Dietz-Pfeilstetter, A., ”Scaffold/Matrix Attchment Regions: Topological Switches with Multiple Regulatory Functions,” Crit.Rev.in Eukaryot.Gene Expr., Vol. 6, 1996, pp. 115-138.
O’Brien, L., The statistical analysis of contingency table designs, no. 51 ed., Order from Environmental Publications, University of East Anglia, Norwich, 1989.
Bucher, P. and Trifonov, N., ”CCAAT-box revisited: Bidirectionality, Location and Context,” J.Biomol.Struct.Dyn., Vol. 6, 1988, pp. 1231-1236.
Faisst, S. and Meyer, S., ”Compilation of vertebrate encoded transcription factors,” Nucleic Acid Res., Vol. 20, 1992, pp. 1-26.
Ghosh, D., ”A relational database of transcription factors,” Nucleic Acid Res., Vol. 18, 1990, pp. 1749-1756.
Ghosh, D., ”OOTFD (Object-Oriented Transcription Factors Database): an object-oriented successor to TFD,” Nucleic Acid Res., Vol. 26, 1998, pp. 360-362.
Gokhale, D. V. and Kullback, S., The information in contingency tables, M. Dekker, New York, 1978.
Gribskov, M., Luethy, R., and Eisenberg, D., ”Profile Analysis,” Methods in Enzymology, Vol. 183, 1990, pp. 146-159.
Hair, J., Anderson, R., and Tatham, R., ”Multivariate data analysis with readings,” 1987.
Hartwell, L. and Kasten, M., ”Cell cycle control and cancer,” Science, Vol. 266, pp. 1821-1828, 1994.
Kachigan, S., ”Statistical Analysis,” 1986.
Kadonaga, J., “Eukaryotic transcription: An interlaced network of transcription factors and chromatin-modifying machines,” Cell, Vol. 92, 1998, pp. 307-313.
Kliensmith, L. and Kish, V., Principles of cell and molecular biology 1995.
Liebich, I., Bode, J., Frisch, M., and Wingender, E., ”S/MARt DB: a database on scaffold/matrix attached regions,” Nucleic Acids Res., Vol. 30, No. 1, 2002, pp. 372-374.
Mardia, K., Kent, J., and Bibby, J., ”Multivariate Analysis,” 1979.
Kel-Margoulis, O. V., Kel, A. E., Reuter, I., Deineko, I. V., and Wingender, E., ”TRANSCompel: a database on composite regulatory elements in eukaryotic genes,” Nucleic Acids Res., Vol. 30, No. 1, 2002, pp. 332-334.
Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A. E., Kel-Margoulis, O. V., Kloos, D. U., Land, S., Lewicki-Potapov, B., Michael, H., Munch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S., and Wingender, E., ”TRANSFAC: transcriptional regulation, from patterns to profiles,” Nucleic Acids Res., Vol. 31, No. 1, 2003, pp. 374-378.
Nikolaev, L., Tsevegiyn, T., Akopov, S., Ashworth, L., and Sverdlov, E., ”Construction of a chromosome specific library of MARs and mapping of matrix attachment regions on human chromosome 19,” Nucleic Acid Res., Vol. 24, 1996, pp. 1330-1336.
Nussinov, R., ”Signals in DNA sequences and their potential properties,” Comput. Applic.Biosci., Vol. 7, 1991, pp. 295-299.
Page, R., ”Minimal Spanning Tree Clustering Methods,” Comm.of the ACM, Vol. 17, 1974, pp. 321-323.
Penotti, F., ”Human DNA TATA boxes and transcription initiation sites. A Statistical Study,” J.Mol.Biol., Vol. 213, 1990, pp. 37-52.
Rabiner, L., ”A tutorial on hidden Markov models and selected applications in speech recognition,” Proc.of the IEEE, Vol. 77, 1989, pp. 257-286.
Roeder, R., ”The role of general initiation factors in transcription by RNA Polymerase II,” Trends in Biochem.Sci., Vol. 21, 1996, pp. 327-335.
Singh, G., Kramer, J., and Krawetz, S., ”Mathematical model to predict regions of chromatin attachment to the nuclear matrix,” Nucleic Acid Res., Vol. 25, 1997, pp. 1419-1425.
Wheeler, D. L., Church, D. M., Edgar, R., Federhen, S., Helmberg, W., Madden, T. L., Pontius, J. U., Schuler, G. D., Schriml, L. M., Sequeira, E., Suzek, T. O., Tatusova, T. A., and Wagner, L., ”Database resources of the National Center for Biotechnology Information: update,” Nucleic Acids Res., Vol. 32 Database issue, 2004, pp. D35-D40.
Wingender, E., Chen, X., Fricke, E., Geffers, R., Hehl, R., Liebich, I., Krull, M., Matys, V., Michael, H., Ohnhauser, R., Pruss, M., Schacherer, F., Thiele, S., and Urbach, S., ”The TRANSFAC system on gene expression regulation,” Nucleic Acids Res., Vol. 29, No. 1, 2001, pp. 281-283.
Zahn, C., ”Graph-theoretical methods for detecting and describing Gestalt clusters,” IEEE Trans.Computers, Vol. 20, 1971, pp. 68-86.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Singh, G.B. (2009). Learning Information Patterns in Biological Databases - Stochastic Data Mining. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09823-4_59
Download citation
DOI: https://doi.org/10.1007/978-0-387-09823-4_59
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09822-7
Online ISBN: 978-0-387-09823-4
eBook Packages: Computer ScienceComputer Science (R0)