Transcription Factor Binding Sites Prediction Based on Sequence Similarity

  • Jeong Seop Sim
  • Soo-Jun Park
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4223)


Sequence algorithms are widely used to study genomic sequences in such fields as DNA fragment assembly, genomic sequence similarities, motif search, etc. In this paper, we propose an algorithm that predicts transcription factor binding sites from a given set of sequences of upstream regions of genes using sequence algorithms, suffix arrays and the Smith-Waterman algorithm.


Transcription Factor Binding Site Input Sequence Sequence Algorithm Motif Search Suffix Array 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Batzoglou, S., Jaffe, D., Stanley, K., Butler, J., Gnerre, S., Mauceli, E., Berger, B., Mesirov, J., Lander, E.: Arachne: Awhole-genome shotgun assembler. Genome Research 12, 177–189 (2002)CrossRefGoogle Scholar
  2. 2.
    Chen, T., Skiena, S.S.: Trie-based data structures for sequence assembly. In: Hein, J., Apostolico, A. (eds.) CPM 1997. LNCS, vol. 1264, pp. 206–223. Springer, Heidelberg (1997)Google Scholar
  3. 3.
    Green, P.: Documentation for phrap, Genome Center, University of Washington,
  4. 4.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)MATHCrossRefGoogle Scholar
  5. 5.
    Ko, P., Aluru, S.: Space Efficient Linear Time Construction of Suffix Arrays. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 200–210. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  6. 6.
    Kato, M., Hata, N., Banerjee, N., Futcher, B., Zhang, M.Q.: Identifying combinatorial regulation of transcription factors and binding motifs. Genome Biology 5(8), R56 (2004)CrossRefGoogle Scholar
  7. 7.
    Kärkkäinen, J., Sanders, P.: Simple linear work suffix array construction, In: International Colloquium on Automata, Languages and Programming, LNCS, vol. 2676, pp. 943–955 (2003)Google Scholar
  8. 8.
    Kim, D.K., Sim, J.S., Park, H., Park, K.: Constructing suffix arrays in linear time. Journal of Discrete Algorithms 3, 126–142 (2005)MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Lipman, D., Pearson, W.: Improved tools for biological sequence comparison. Proc. National Academy of Science 85, 2444–2448 (1988)CrossRefGoogle Scholar
  10. 10.
    Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22, 935–938 (1993)MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Matys, V., Fricke, E., Geffers, R., Goling, E., Haubrock, M., Hehl, R., Hornischer, K., Karas, D., Kel, A.E., Kel-Margoulis, O.V., Kloos, D.U., Land, S., Lewicki-Potapov, B., Michael, H., Munch, R., Reuter, I., Rotert, S., Saxel, H., Scheer, M., Thiele, S., Wingender, E.: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Research 31(1), 374–378 (2003)CrossRefGoogle Scholar
  12. 12.
    Ohler, U., Niemann, H., Liao, G., Rubin, G.M.: Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17 (Suppl. 1), 199–206 (2001)Google Scholar
  13. 13.
    Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)CrossRefGoogle Scholar
  14. 14.
    Stoesser, G., Baker, W., Broek, A., Garcia-Pastor, M., Kanz, C., Kulikova, T., Leinonen, R., Lin, Q., Lombard, V., Lopez, R., Mancuso, R., Nardone, F., Stoehr, P., Tuli, M.A., Tzouvara, K., Vaughan, R.: The EMBL ncleotide sequence database: major new developments. Nucleic Acids Research 31(1), 17–22 (2003)CrossRefGoogle Scholar
  15. 15.
    Zhang, M.Q.: Identification of human gene core promoters in silico. Genome Research 8(3), 319–326 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jeong Seop Sim
    • 1
  • Soo-Jun Park
    • 2
  1. 1.School of Computer Science and EngineeringInha UniversityIncheonKorea
  2. 2.Electronics and Telecommunications Research InstituteDaejeonKorea

Personalised recommendations