Predicting Transcription Factor Binding Sites in DNA Sequences Without Prior Knowledge

  • Wook Lee
  • Byungkyu Park
  • Daesik Choi
  • Chungkeun Lee
  • Hanju Chae
  • Kyungsook HanEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9771)


Transcription factors are proteins involved in converting DNA to RNA by binding to specific regions of DNA. Many computational methods developed for predicting transcription factor binding sites in DNA are either tissue-specific or species-specific methods, so cannot be used without prior knowledge of tissue or species. Some prediction methods are limited to short DNA sequences only, so cannot be used to find potential transcription factor binding sites in long DNA sequences. In this study, we developed a new method that predicts transcription factor binding sites in DNA sequences of any length without prior knowledge of tissue or species. In independent testing with datasets that were not used in training the method, it achieved reasonably good performances (accuracy of 81.84 % and MCC of 0.634 in one testing, and accuracy of 71.16 % and MCC of 0.403 in another testing). Our method will be useful for finding putative transcription factor binding sites in the absence of prior knowledge of tissue or species.


Transcription factor binding site Protein-DNA interaction 



This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (2015R1A1A3A04001243) and in part by the international cooperation program managed by the National Research Foundation (NRF) (2014K2A2A2000670).


  1. 1.
    Latchman, D.S.: Transcription factors: an overview. Int. J. Biochem. Cell Biol. 29(12), 1305–1312 (1997)CrossRefGoogle Scholar
  2. 2.
    Zhong, S., He, X., Bar-Joseph, Z.: Predicting tissue specific transcription factor binding sites. BMC Genom. 14, 796 (2013)CrossRefGoogle Scholar
  3. 3.
    Messeguer, X., Escudero, R., Farré, D., Nuñez, O., Martínez, J., Albà, M.M.: PROMO: detection of known transcription regulatory elements using species-tailored searches. Bioinformatics 18(2), 333–334 (2002)CrossRefGoogle Scholar
  4. 4.
    Alipanhi, B., Delong, A., Weirauch, M., Frey, B.: Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33(8), 831–838 (2015)CrossRefGoogle Scholar
  5. 5.
    Im, J., Tuvshinjargal, N., Park, B., Lee, W., Huang, D.S., Han, K.: PNImodeler: web server for inferring protein-binding nucleotides from sequence data. BMC Genom. 16(Suppl 3), S6 (2015)CrossRefGoogle Scholar
  6. 6.
    Mathelier, A., Zhao, X., Zhang, A.W., Parcy, F., Worseley-Hunt, R., Arenillas, D.J., Buchman, S., Chen, C.Y., Chou, A., Ienasescu, H., Lim, J., Shyr, C., Tan, G., Zhou, M., Lenhard, B., Sandelin, A., Wasserman, W.W.: JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding. Nucleic Acids Res. 42(Database issue), D142–D147 (2014)CrossRefGoogle Scholar
  7. 7.
    Huang, Y., Niu, B., Gao, Y., Fu, L., Li, W.: CD-HIT suite: a web server for clustering and comparing biological sequences. Bioinformatics 26(5), 680–682 (2010)CrossRefGoogle Scholar
  8. 8.
    Choi, S., Han, K.: Predicting protein-binding RNA nucleotides using the feature-based removal of data redundancy and the interaction propensity of nucleotide triplets. Comput. Biol. Med. 43(11), 1687–1697 (2013)CrossRefGoogle Scholar
  9. 9.
    You, Z.H., Chan, K.C., Hu, P.: Predicting protein-protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest. PLoS ONE 10(5), e0125811 (2015)CrossRefGoogle Scholar
  10. 10.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 1–27 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Wook Lee
    • 1
  • Byungkyu Park
    • 1
  • Daesik Choi
    • 1
  • Chungkeun Lee
    • 1
  • Hanju Chae
    • 1
  • Kyungsook Han
    • 1
    Email author
  1. 1.Department of Computer Science and EngineeringInha UniversityIncheonSouth Korea

Personalised recommendations