Learning Information Patterns in Biological Databases - Stochastic Data Mining

Singh, Gautam B.

doi:10.1007/978-0-387-09823-4_59

Gautam B. Singh³

16k Accesses

Summary

This chapter aims at developing the computational theory for modeling patterns and their hierarchical coordination within biological sequences. With the exception of the promoters and enhancers, the functional significance of the non-coding DNA is not well understood. Scientists are now discovering that specific regions of non-coding DNA interact with the cellular machinery and help bring about the expression of genes. Our premise is that it is possible to study the arrangements of patterns in biological sequences through machine learning algorithms. As the biological database continue their exponential growth, it becomes feasible to apply in-silico Data Mining algorithms to discover interesting patterns of motif arrangements and the frequency of their re-iteration. A systematic procedure for achieving this goal is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 349.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unsupervised statistical discovery of spaced motifs in prokaryotic genomes

Article Open access 05 January 2017

Pattern Mining: Current Challenges and Opportunities

Introduction to Pattern Mining

References

Berg, O. and Hippel, P. v., ”Selection of DNA binding sites by regulatory proteins,” J.Mol.Biol., Vol. 193, 1987, pp. 723-750.
Article Google Scholar
Bode, J., Stengert-Iber, M., Kay, V., Schlake, T., and Dietz-Pfeilstetter, A., ”Scaffold/Matrix Attchment Regions: Topological Switches with Multiple Regulatory Functions,” Crit.Rev.in Eukaryot.Gene Expr., Vol. 6, 1996, pp. 115-138.
Google Scholar
O’Brien, L., The statistical analysis of contingency table designs, no. 51 ed., Order from Environmental Publications, University of East Anglia, Norwich, 1989.
Google Scholar
Bucher, P. and Trifonov, N., ”CCAAT-box revisited: Bidirectionality, Location and Context,” J.Biomol.Struct.Dyn., Vol. 6, 1988, pp. 1231-1236.
Google Scholar
Faisst, S. and Meyer, S., ”Compilation of vertebrate encoded transcription factors,” Nucleic Acid Res., Vol. 20, 1992, pp. 1-26.
Article Google Scholar
Ghosh, D., ”A relational database of transcription factors,” Nucleic Acid Res., Vol. 18, 1990, pp. 1749-1756.
Article Google Scholar
Ghosh, D., ”OOTFD (Object-Oriented Transcription Factors Database): an object-oriented successor to TFD,” Nucleic Acid Res., Vol. 26, 1998, pp. 360-362.
Article Google Scholar
Gokhale, D. V. and Kullback, S., The information in contingency tables, M. Dekker, New York, 1978.
MATH Google Scholar
Gribskov, M., Luethy, R., and Eisenberg, D., ”Profile Analysis,” Methods in Enzymology, Vol. 183, 1990, pp. 146-159.
Article Google Scholar
Hair, J., Anderson, R., and Tatham, R., ”Multivariate data analysis with readings,” 1987.
Google Scholar
Hartwell, L. and Kasten, M., ”Cell cycle control and cancer,” Science, Vol. 266, pp. 1821-1828, 1994.
Article Google Scholar
Kachigan, S., ”Statistical Analysis,” 1986.
Google Scholar
Kadonaga, J., “Eukaryotic transcription: An interlaced network of transcription factors and chromatin-modifying machines,” Cell, Vol. 92, 1998, pp. 307-313.
Article Google Scholar
Kliensmith, L. and Kish, V., Principles of cell and molecular biology 1995.
Google Scholar
Liebich, I., Bode, J., Frisch, M., and Wingender, E., ”S/MARt DB: a database on scaffold/matrix attached regions,” Nucleic Acids Res., Vol. 30, No. 1, 2002, pp. 372-374.
Article Google Scholar
Mardia, K., Kent, J., and Bibby, J., ”Multivariate Analysis,” 1979.
Google Scholar
Kel-Margoulis, O. V., Kel, A. E., Reuter, I., Deineko, I. V., and Wingender, E., ”TRANSCompel: a database on composite regulatory elements in eukaryotic genes,” Nucleic Acids Res., Vol. 30, No. 1, 2002, pp. 332-334.
Article Google Scholar
Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A. E., Kel-Margoulis, O. V., Kloos, D. U., Land, S., Lewicki-Potapov, B., Michael, H., Munch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S., and Wingender, E., ”TRANSFAC: transcriptional regulation, from patterns to profiles,” Nucleic Acids Res., Vol. 31, No. 1, 2003, pp. 374-378.
Article Google Scholar
Nikolaev, L., Tsevegiyn, T., Akopov, S., Ashworth, L., and Sverdlov, E., ”Construction of a chromosome specific library of MARs and mapping of matrix attachment regions on human chromosome 19,” Nucleic Acid Res., Vol. 24, 1996, pp. 1330-1336.
Article Google Scholar
Nussinov, R., ”Signals in DNA sequences and their potential properties,” Comput. Applic.Biosci., Vol. 7, 1991, pp. 295-299.
Google Scholar
Page, R., ”Minimal Spanning Tree Clustering Methods,” Comm.of the ACM, Vol. 17, 1974, pp. 321-323.
Article MathSciNet Google Scholar
Penotti, F., ”Human DNA TATA boxes and transcription initiation sites. A Statistical Study,” J.Mol.Biol., Vol. 213, 1990, pp. 37-52.
Article Google Scholar
Rabiner, L., ”A tutorial on hidden Markov models and selected applications in speech recognition,” Proc.of the IEEE, Vol. 77, 1989, pp. 257-286.
Article Google Scholar
Roeder, R., ”The role of general initiation factors in transcription by RNA Polymerase II,” Trends in Biochem.Sci., Vol. 21, 1996, pp. 327-335.
Google Scholar
Singh, G., Kramer, J., and Krawetz, S., ”Mathematical model to predict regions of chromatin attachment to the nuclear matrix,” Nucleic Acid Res., Vol. 25, 1997, pp. 1419-1425.
Article Google Scholar
Wheeler, D. L., Church, D. M., Edgar, R., Federhen, S., Helmberg, W., Madden, T. L., Pontius, J. U., Schuler, G. D., Schriml, L. M., Sequeira, E., Suzek, T. O., Tatusova, T. A., and Wagner, L., ”Database resources of the National Center for Biotechnology Information: update,” Nucleic Acids Res., Vol. 32 Database issue, 2004, pp. D35-D40.
Article Google Scholar
Wingender, E., Chen, X., Fricke, E., Geffers, R., Hehl, R., Liebich, I., Krull, M., Matys, V., Michael, H., Ohnhauser, R., Pruss, M., Schacherer, F., Thiele, S., and Urbach, S., ”The TRANSFAC system on gene expression regulation,” Nucleic Acids Res., Vol. 29, No. 1, 2001, pp. 281-283.
Article Google Scholar
Zahn, C., ”Graph-theoretical methods for detecting and describing Gestalt clusters,” IEEE Trans.Computers, Vol. 20, 1971, pp. 68-86.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Center for Bioinformatics, Oakland University, Rochester, MI, 48309, USA
Gautam B. Singh

Authors

Gautam B. Singh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

, Dept. Industrial Engineering, Tel Aviv University, Ramat Aviv, 69978, Israel
Oded Maimon
, Dept. Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel
Lior Rokach

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Singh, G.B. (2009). Learning Information Patterns in Biological Databases - Stochastic Data Mining. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09823-4_59

Download citation

DOI: https://doi.org/10.1007/978-0-387-09823-4_59
Published: 26 June 2010
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09822-7
Online ISBN: 978-0-387-09823-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Information Patterns in Biological Databases - Stochastic Data Mining

Summary

Access this chapter

Preview

Similar content being viewed by others

Unsupervised statistical discovery of spaced motifs in prokaryotic genomes

Pattern Mining: Current Challenges and Opportunities

Introduction to Pattern Mining

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Learning Information Patterns in Biological Databases - Stochastic Data Mining

Summary

Access this chapter

Preview

Similar content being viewed by others

Unsupervised statistical discovery of spaced motifs in prokaryotic genomes

Pattern Mining: Current Challenges and Opportunities

Introduction to Pattern Mining

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation