An Efficient Algorithm for the Identification of Repetitive Variable Motifs in the Regulatory Sequences of Co-expressed Genes

  • Abanish Singh
  • Nikola Stojanovic
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4263)


Over the last several years there has been an explosion in the number of computational methods for the detection of transcription factor binding sites in DNA sequences. Although there has been some success in this field, the existing tools are still neither sensitive nor specific enough, usually suffering from the detection of a large number of false positive signals. Given the properties of genomic sequences this is not unexpected, but one can still find interesting features worthy of further computational and laboratory bench study. We present an efficient algorithm developed to find all significant variable motifs in given sequences. In our view, it is important that we generate complete data, upon which separate selection criteria can be applied depending on the nature of the sites one wants to locate, and their biological properties. We discuss our algorithm and our supplementary software, and conclude with an illustration of their application on two eukaryotic data sets.


Transcription Factor Binding Site Upstream Sequence Mixed Lineage Leukemia Variable Motif Positional Conservation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adebiyi, E.F., Jiang, T., Kaufmann, M.: An efficient algorithm for finding short approximate non–tandem repeats. Bioinformatics 17, S5–S12 (2001)Google Scholar
  2. 2.
    Apostolico, A., Bock, M.E., Lonardi, S., Xu, X.: Efficient detection of unusual words. J. Comput. Biol. 7, 71–94 (2000)CrossRefGoogle Scholar
  3. 3.
    Balhoff, J.P., Wray, G.A.: Evolutionary analysis of the well characterized endo16 promoter reveals substantial variation within functional sites. PNAS 102, 8591–8596 (2005)CrossRefGoogle Scholar
  4. 4.
    Bannai, H., Inenaga, S., Shinohara, A., Takeda, M., Miyano, S.: Efficiently finding regulatory elements using correlation with gene expression. J. Bioinform. Comput. Biol. 2, 273–288 (2004)CrossRefGoogle Scholar
  5. 5.
    Birney, E., Andrews, D., Caccamo, M., et al.: Ensembl 2006. Nucleic Acids Res 34, D453–D561 (2006)Google Scholar
  6. 6.
    Burgermeister, E., Tencer, L., Liscovitch, M.: Peroxisome proliferator–activated receptor-γ upregulates Caveolin-1 and Caveolin-2 in human carcinoma cells. Oncogene 22, 3888–3900 (2003)CrossRefGoogle Scholar
  7. 7.
    Che, D., Jensen, S., Cai, L., Liu, J.S.: BEST: Binding–site Estimation Suite of Tools. Bioinformatics 21, 2909–2911 (2005)CrossRefGoogle Scholar
  8. 8.
    Corcoran, D.L., Feingold, E., Dominick, J., Wright, M., Harnaha, J., Trucco, M., Giannoukakis, N., Benos, P.V.: Footer: A quantitative comparative genomics method for efficient recognition of cis–regulatory elements. Genome Res 15, 840–847 (2005)CrossRefGoogle Scholar
  9. 9.
    Hess, J.L.: MLL: a histone methyltransferase disrupted in leukemia. Trends Mol. Med. 10, 500–507 (2004)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Hughes, J.D., Estep, P.W., Tavazoie, S., Church, G.M.: Computational identification of cis–regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J. Mol. Biol. 296, 1205–1214 (2000)CrossRefGoogle Scholar
  11. 11.
    Jegga, A.G., Sherwood, S.P., Carman, J.W., Pinski, A.T., Phillips, J.L., Pestian, J.P., Aronow, B.J.: Detection and visualization of compositionally similar cis–regulatory element clusters in orthologous and coordinately controlled genes. Genome Res 12, 1408–1417 (2002)CrossRefGoogle Scholar
  12. 12.
    Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: a Gibbs Sampling strategy for multiple alignment. Science 262, 208–214 (1993)CrossRefGoogle Scholar
  13. 13.
    Matys, V., Kel–Margoulis, O.V., Fricke, E., et al.: TRANSFAC®and its module TRANSCompel®: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 34, D108–D110 (2006)Google Scholar
  14. 14.
    Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W., Lenhard, B.: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 32, D91–D94 (2004)Google Scholar
  15. 15.
    Sharan, R., Ovcharenko, I., Ben–Hur, A., Karp, R.M.: CREME: a framework for identifying cis–regulatory modules in human–mouse conserved segments. In: Proc. of the 11th International Conf. on Intelligent Systems in Mol. Biol, pp. 283–291 (2003)Google Scholar
  16. 16.
    Singh, A., Stojanovic, N.: Computational Analysis of the Distribution of Short Repeated Motifs in Human Genomic Sequences. In: Proc. BIOT 2006 (to appear)Google Scholar
  17. 17.
    Stojanovic, N., Florea, L., Riemer, C., Gumucio, D., Slightom, J., Goodman, M., Miller, W., Hardison, R.: Comparison of five methods for finding conserved sequences in multiple alignments of gene regulatory regions. Nucleic Acids Res 27, 3899–3910 (1999)CrossRefGoogle Scholar
  18. 18.
    The ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306 636–640 (2004)Google Scholar
  19. 19.
    Tompa, M., Li, N., Bailey, T.L., et al.: Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology 23, 137–144 (2005)CrossRefGoogle Scholar
  20. 20.
    van Helden, J., Andre, B., Collado–Vides, J.: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842 (1998)CrossRefGoogle Scholar
  21. 21.
    van Helden, J.: Metrics for comparing regulatory sequences on the basis of pattern counts. Bioinformatics 20, 399–406 (2004)CrossRefGoogle Scholar
  22. 22.
    Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Abanish Singh
    • 1
  • Nikola Stojanovic
    • 1
  1. 1.Department of Computer Science and EngineeringThe University of Texas at ArlingtonArlingtonUSA

Personalised recommendations