Abstract
A variety of patterns have been observed on the DNA and protein sequences that serve as control points for gene expression and cellular functions. Owing to the vital role of such patterns discovered on biological sequences, they are generally cataolged and maintained within internationally shared databases. Furthermore, the variability in a family of observed patterns is often represented using computational models in order to facilitate their search within an uncharacterized biological sequence. As the biological data is comprised of a mosaic of sequence-levels motifs, it is significant to unravel the synergies of macromolecular coordination utilized in cellspecific differential synthesis of proteins. This article provides an overview of the various pattern representation methodologies and the surveys the pattern databases available for use to the molecular biologists. Our aim is to describe the principles behind the computational modeling and analysis techniques utilized in bioinformatics research, with the objective of providing insight necessary to better understand and effectively utilize the available databases and analysis tools. We also provide a detailed review of DNA sequence level patterns responsible for structural conformations within the Scaffold or Matrix Attachment Regions (S/MARs).
Similar content being viewed by others
References
Kuensmith, L. and Kish, V. (1995) Principles of Cell and Molecular Biology.
Kadonaga, J (1998) Eukaryotic transcription: an interlaced network of transcription factors and chromatin-modifying machines. Cell 92, 307–313.
Roeder, R. (1996) The role of general initiation factors in transcription by RNA polymerase II. Trends in Biochem. Sci. 21, 327–335.
Hartwell, L. and Kasten, M. (1994) Cell cycle control and cancer. Science 266, 1821–1828.
Mays, V., Fricke, E., Geffers, R., et al. (2003) TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31. 374–378.
Bucher, P. and Trifonov, E. (1986) Compilation and analysis of eukaryotic POL II promoter sequences. Nucleic Acid Res. 14, 10009–10026.
Ghosh, D. (1998) OOTFD (Object-Oriented Transcription Factors Database): an object-oriented successor to TFD. Nucleic Acid Res. 26, 360–362.
Staden, R. (1988) Methods for calculating the probabilities of finding patterns in sequences. Comput. Applic. Biosci. 5, 89–96.
Staden, R. (1988) Searching for patterns in proteins and nucleic acid sequences. Methods Enzymol. 183, 193–211.
Gribskov, M., McLachan, A., and Eisenberg, D. (1987) Profile analysis: detection of distantly related proteins. Proc. Natl. Acad. Sci. USA 84, 4355–4358.
Gribskov, M., Luethy, R., and Eisenberg, D. (1990) Profile analysis. Methods Enzymol. 183, 146–159.
Rabiner, L. (1989) A tutorial on hidden Matkov models and selected applications in speech recognition. Proc of the IEEE 77, 257–286.
Kogh, A., Brown, M., Mian, S., Sjolander, K., and Haussler, D. (1994) Hidden Mailkov models in computational biology-Applications to protein modeling. J. Mol. Biol. 235, 1501–1531.
Kogh, A., Brown, M., Mian, S., and Haussler, D. (1994) A hidden Markov model that finds genes in E. coli DNA. Nucleic Acid Res. 22, 4768–4778.
Hulo, N., Sigrist, C. J, Saux Le, V., et al. (2004) Recent improvements to the PROSITE database. Nucleic Acids Res. 32 Database issue, D1-D13
Wingender, E. (1990) Transcription regulating proteins and their recognition sequences. Crit. Rev. Eukaryot. Gene Expr. 1, 11–48.
Liebich, I., Bode, J., Frisch, M., and Wingender, E. S/MARt DBa database on scaffold/matrix attached regions. Nucleic Acids Res. 30, 372–374.
KolMargoulis, O., VoKl, A. E., Reuter, I., Deineko, I. V., and Wingender, E. (2002) TRANSCompel: a database on composite regulatory elements in eukaryotic genes. Nucleic Acids Res. 30, 24
Wheeler, D. L., Church, D. M., Edgar, R., et al. (2004) Database resources of the National Center for Botechnology Information: update. Nucleic Acids Res. 32 Database issue, D35-D40.
Boulikas, T. (1993) Nature of DNA sequences at the attachment regions of genes to the nuclear matrix. J. Cell. Biochem. 52, 14–22.
Bode, J., Stengert-Iber, M., Kay, V., Schlake, T., and Dietz-Pfeilstetter, A. (1996) Scaffold/matrix attachment regions: topological switches with multiple regulatory functions. Crit. Rev. Eukaryot. Gene Expr. 6, 115–138.
Bode, J., Rios-Ramirez M., Mielke, C., Stengert, M., Kay, V., and Kehr-Wirth, D. (1995) Scaffold/matrix attachment regions: strucutral properties creating transcriptionally active loci. Intl. Rev. Cytol. 162A, 384–452.
Nikolaev, L., Tsevegiyn, T., Akopov, S., Ashworth, L., and Sverdlov, E. (1996) Construction of a chromosome specific library of mars and mapping of matrix attachment regions on human chromosome 19. Nucleic Acid Res. 24, 1330–1336.
Phi-Van L. and Stratling, W. H. (1988) The matrix attachment regions of the chicken lysoxyme gene co-map with the boundaries of chromatin domain. EMBO J. 7:655–664.
Iade, J., Rios-Ramirez, M., Mielke, C., Stengert, M., Kay, V., and Kehr-Wirth, D. (1995) Scaffold/matrix attachment regions: structural properties creating transcriptionally active loci. Intl. Rev. Cytol. 162A, 389–454.
Iarman A. and Higgs, D. (1998) Nuclear scaffold attachment sites in the human globin gene complexes. EMBO J. 7, 3337–3344.
Farache, G., Razin, S., Targa, F., and Scherrer, K. (1990) Organization of the Boundary of the chicken alpha globin gene domain and characterization of a CR 1-specific protein binding site. Nucleic Acid Res. 18, 401–409.
Deppert, W. (1996) Bding of MAR-DNA elements by mutant p9 possible implications for oncogenic function. J. Cell. Biochem. 62, 172–180.
Kramer, J. and Krawetz, S. (1995) Matrix associated regions in haploid expressed domains. Mammal. Genome 6, 677–679.
Singh, G., Kramer, J., and Krawetz, S. (1997) Mathematical model to predict regions of chromatin attachment to the nuclear matrix. Nucleic Acid Res. 25, 1419–1425.
Perier, R., Junier, T., and Bucher, P. (1998) The eukaryotic promoter database. Nucleic Acid Res. 26, 353–357.
Bucher P. and Bryan, B (1984) Signal search analysis: a new method to localize and characterize functionally important DNA sequences. Nucleic Acid Res. 12, 287–305.
Chen, Q., Hertz, J., and Stormo, G. (1995) MATRIX SEARCH 1.0: a computer program that scans DNA sequences for transcriptional elements using a database of weight matrices. Comput. Applic. Biosci. 11, 563–566.
Quandt, K., Grote, K., and Werner, T. (1996) GenomeInspector: basic software tools for analysis of spatial correlation between genomic structures within megabase sequences. Genomics 33, 301–304.
Strissel, P., Espinosa, R., Rowley, J., and Swift, H. (1996) Scaffold attachment regions in centromere-associated DNA. Chromosoma 105, 122–133.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Singh, G.B., Singh, H. Databases, models, and algorithms for functional genomics. Mol Biotechnol 29, 165–183 (2005). https://doi.org/10.1385/MB:29:2:165
Issue Date:
DOI: https://doi.org/10.1385/MB:29:2:165