Skip to main content

Learning Information Patterns in Biological Databases - Stochastic Data Mining

  • Chapter
  • First Online:
Data Mining and Knowledge Discovery Handbook
  • 16k Accesses

Summary

This chapter aims at developing the computational theory for modeling patterns and their hierarchical coordination within biological sequences. With the exception of the promoters and enhancers, the functional significance of the non-coding DNA is not well understood. Scientists are now discovering that specific regions of non-coding DNA interact with the cellular machinery and help bring about the expression of genes. Our premise is that it is possible to study the arrangements of patterns in biological sequences through machine learning algorithms. As the biological database continue their exponential growth, it becomes feasible to apply in-silico Data Mining algorithms to discover interesting patterns of motif arrangements and the frequency of their re-iteration. A systematic procedure for achieving this goal is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Berg, O. and Hippel, P. v., ”Selection of DNA binding sites by regulatory proteins,” J.Mol.Biol., Vol. 193, 1987, pp. 723-750.

    Article  Google Scholar 

  • Bode, J., Stengert-Iber, M., Kay, V., Schlake, T., and Dietz-Pfeilstetter, A., ”Scaffold/Matrix Attchment Regions: Topological Switches with Multiple Regulatory Functions,” Crit.Rev.in Eukaryot.Gene Expr., Vol. 6, 1996, pp. 115-138.

    Google Scholar 

  • O’Brien, L., The statistical analysis of contingency table designs, no. 51 ed., Order from Environmental Publications, University of East Anglia, Norwich, 1989.

    Google Scholar 

  • Bucher, P. and Trifonov, N., ”CCAAT-box revisited: Bidirectionality, Location and Context,” J.Biomol.Struct.Dyn., Vol. 6, 1988, pp. 1231-1236.

    Google Scholar 

  • Faisst, S. and Meyer, S., ”Compilation of vertebrate encoded transcription factors,” Nucleic Acid Res., Vol. 20, 1992, pp. 1-26.

    Article  Google Scholar 

  • Ghosh, D., ”A relational database of transcription factors,” Nucleic Acid Res., Vol. 18, 1990, pp. 1749-1756.

    Article  Google Scholar 

  • Ghosh, D., ”OOTFD (Object-Oriented Transcription Factors Database): an object-oriented successor to TFD,” Nucleic Acid Res., Vol. 26, 1998, pp. 360-362.

    Article  Google Scholar 

  • Gokhale, D. V. and Kullback, S., The information in contingency tables, M. Dekker, New York, 1978.

    MATH  Google Scholar 

  • Gribskov, M., Luethy, R., and Eisenberg, D., ”Profile Analysis,” Methods in Enzymology, Vol. 183, 1990, pp. 146-159.

    Article  Google Scholar 

  • Hair, J., Anderson, R., and Tatham, R., ”Multivariate data analysis with readings,” 1987.

    Google Scholar 

  • Hartwell, L. and Kasten, M., ”Cell cycle control and cancer,” Science, Vol. 266, pp. 1821-1828, 1994.

    Article  Google Scholar 

  • Kachigan, S., ”Statistical Analysis,” 1986.

    Google Scholar 

  • Kadonaga, J., “Eukaryotic transcription: An interlaced network of transcription factors and chromatin-modifying machines,” Cell, Vol. 92, 1998, pp. 307-313.

    Article  Google Scholar 

  • Kliensmith, L. and Kish, V., Principles of cell and molecular biology 1995.

    Google Scholar 

  • Liebich, I., Bode, J., Frisch, M., and Wingender, E., ”S/MARt DB: a database on scaffold/matrix attached regions,” Nucleic Acids Res., Vol. 30, No. 1, 2002, pp. 372-374.

    Article  Google Scholar 

  • Mardia, K., Kent, J., and Bibby, J., ”Multivariate Analysis,” 1979.

    Google Scholar 

  • Kel-Margoulis, O. V., Kel, A. E., Reuter, I., Deineko, I. V., and Wingender, E., ”TRANSCompel: a database on composite regulatory elements in eukaryotic genes,” Nucleic Acids Res., Vol. 30, No. 1, 2002, pp. 332-334.

    Article  Google Scholar 

  • Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A. E., Kel-Margoulis, O. V., Kloos, D. U., Land, S., Lewicki-Potapov, B., Michael, H., Munch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S., and Wingender, E., ”TRANSFAC: transcriptional regulation, from patterns to profiles,” Nucleic Acids Res., Vol. 31, No. 1, 2003, pp. 374-378.

    Article  Google Scholar 

  • Nikolaev, L., Tsevegiyn, T., Akopov, S., Ashworth, L., and Sverdlov, E., ”Construction of a chromosome specific library of MARs and mapping of matrix attachment regions on human chromosome 19,” Nucleic Acid Res., Vol. 24, 1996, pp. 1330-1336.

    Article  Google Scholar 

  • Nussinov, R., ”Signals in DNA sequences and their potential properties,” Comput. Applic.Biosci., Vol. 7, 1991, pp. 295-299.

    Google Scholar 

  • Page, R., ”Minimal Spanning Tree Clustering Methods,” Comm.of the ACM, Vol. 17, 1974, pp. 321-323.

    Article  MathSciNet  Google Scholar 

  • Penotti, F., ”Human DNA TATA boxes and transcription initiation sites. A Statistical Study,” J.Mol.Biol., Vol. 213, 1990, pp. 37-52.

    Article  Google Scholar 

  • Rabiner, L., ”A tutorial on hidden Markov models and selected applications in speech recognition,” Proc.of the IEEE, Vol. 77, 1989, pp. 257-286.

    Article  Google Scholar 

  • Roeder, R., ”The role of general initiation factors in transcription by RNA Polymerase II,” Trends in Biochem.Sci., Vol. 21, 1996, pp. 327-335.

    Google Scholar 

  • Singh, G., Kramer, J., and Krawetz, S., ”Mathematical model to predict regions of chromatin attachment to the nuclear matrix,” Nucleic Acid Res., Vol. 25, 1997, pp. 1419-1425.

    Article  Google Scholar 

  • Wheeler, D. L., Church, D. M., Edgar, R., Federhen, S., Helmberg, W., Madden, T. L., Pontius, J. U., Schuler, G. D., Schriml, L. M., Sequeira, E., Suzek, T. O., Tatusova, T. A., and Wagner, L., ”Database resources of the National Center for Biotechnology Information: update,” Nucleic Acids Res., Vol. 32 Database issue, 2004, pp. D35-D40.

    Article  Google Scholar 

  • Wingender, E., Chen, X., Fricke, E., Geffers, R., Hehl, R., Liebich, I., Krull, M., Matys, V., Michael, H., Ohnhauser, R., Pruss, M., Schacherer, F., Thiele, S., and Urbach, S., ”The TRANSFAC system on gene expression regulation,” Nucleic Acids Res., Vol. 29, No. 1, 2001, pp. 281-283.

    Article  Google Scholar 

  • Zahn, C., ”Graph-theoretical methods for detecting and describing Gestalt clusters,” IEEE Trans.Computers, Vol. 20, 1971, pp. 68-86.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Singh, G.B. (2009). Learning Information Patterns in Biological Databases - Stochastic Data Mining. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09823-4_59

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-09823-4_59

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-09822-7

  • Online ISBN: 978-0-387-09823-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics