An Efficient Algorithm for the Identification of Repetitive Variable Motifs in the Regulatory Sequences of Co-expressed Genes
Over the last several years there has been an explosion in the number of computational methods for the detection of transcription factor binding sites in DNA sequences. Although there has been some success in this field, the existing tools are still neither sensitive nor specific enough, usually suffering from the detection of a large number of false positive signals. Given the properties of genomic sequences this is not unexpected, but one can still find interesting features worthy of further computational and laboratory bench study. We present an efficient algorithm developed to find all significant variable motifs in given sequences. In our view, it is important that we generate complete data, upon which separate selection criteria can be applied depending on the nature of the sites one wants to locate, and their biological properties. We discuss our algorithm and our supplementary software, and conclude with an illustration of their application on two eukaryotic data sets.
KeywordsTranscription Factor Binding Site Upstream Sequence Mixed Lineage Leukemia Variable Motif Positional Conservation
Unable to display preview. Download preview PDF.
- 1.Adebiyi, E.F., Jiang, T., Kaufmann, M.: An efficient algorithm for finding short approximate non–tandem repeats. Bioinformatics 17, S5–S12 (2001)Google Scholar
- 5.Birney, E., Andrews, D., Caccamo, M., et al.: Ensembl 2006. Nucleic Acids Res 34, D453–D561 (2006)Google Scholar
- 13.Matys, V., Kel–Margoulis, O.V., Fricke, E., et al.: TRANSFAC®and its module TRANSCompel®: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 34, D108–D110 (2006)Google Scholar
- 14.Sandelin, A., Alkema, W., Engstrom, P., Wasserman, W., Lenhard, B.: JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 32, D91–D94 (2004)Google Scholar
- 15.Sharan, R., Ovcharenko, I., Ben–Hur, A., Karp, R.M.: CREME: a framework for identifying cis–regulatory modules in human–mouse conserved segments. In: Proc. of the 11th International Conf. on Intelligent Systems in Mol. Biol, pp. 283–291 (2003)Google Scholar
- 16.Singh, A., Stojanovic, N.: Computational Analysis of the Distribution of Short Repeated Motifs in Human Genomic Sequences. In: Proc. BIOT 2006 (to appear)Google Scholar
- 18.The ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306 636–640 (2004)Google Scholar
- 22.Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)Google Scholar