Improving Transcription Factor Binding Site Predictions by Using Randomised Negative Examples

  • Faisal Rezwan
  • Yi Sun
  • Neil Davey
  • Rod Adams
  • Alistair G. Rust
  • Mark Robinson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7223)


It is known that much of the genetic change underlying morphological evolution takes place in cis-regulatory regions, rather than in the coding regions of genes. Identifying these sites in a genome is a non-trivial problem. Experimental methods for finding binding sites exist with some limitations regarding their applicability, accuracy, availability or cost. On the other hand predicting algorithms perform rather poorly. The aim of this research is to develop and improve computational approaches for the prediction of transcription factor binding sites (TFBSs) by integrating the results of computational algorithms and other sources of complementary biological evidence, with particular emphasis on the use of the Support Vector Machine (SVM). Data from two organisms, yeast and mouse, were used in this study. The initial results were not particularly encouraging, as still giving predictions of low quality. However, when the vectors labelled as non-binding sites in the training set were replaced by randomised training vectors, a significant improvement in performance was observed. This gave substantial improvement over the yeast genome and even greater improvement for the mouse data. In fact the resulting classifier was finding over 80% of the binding sites in the test set and moreover 80% of the predictions were correct.


Support Vector Machine Transcription Factor Binding Site Prediction Algorithm Confusion Matrix Minority Class 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arnone, M.I., Davidson, E.H.: The hardwiring of development: organization and function of genomic regulatory systems. Development 124, 1851–1864 (1997)Google Scholar
  2. 2.
    Davidson, E.H.: Genomic Regulatory Systems: Development and Evolution. Academic Press (2001)Google Scholar
  3. 3.
    Sun, Y., Robinson, M., Adams, R., Davey, N., Rust, A.G.: Predicting Binding Sites in the Mouse Genome. In: ICMLA, pp. 476–481. IEEE Computer Society (2007)Google Scholar
  4. 4.
    Sun, Y., Robinson, M., Adams, R., Rust, A.G., Davey, N.: Prediction of Binding Sites in the Mouse Genome Using Support Vector Machines. In: Kůrková, V., Neruda, R., Koutník, J. (eds.) ICANN 2008, Part II. LNCS, vol. 5164, pp. 91–100. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Sun, Y., Robinson, M., Adams, R., te Boekhorst, R., Rust, A.G., Davey, N.: Integrating genomic binding site predictions using real-valued meta-classiers. Neural Comput. Appl. 18, 577–590 (2009)CrossRefGoogle Scholar
  6. 6.
    Sun, Y., Castellano, C.G., Robinson, M., Adams, R., Rust, A.G., Davey, N.: Using pre and post-processing methods to improve binding site predictions. Pattern Recogn. 42, 1949–1958 (2009)zbMATHCrossRefGoogle Scholar
  7. 7.
    Robinson, M., Castellano, C.G., Adams, R., Davey, N., Sun, Y.: Identifying Binding Sites in Sequential Genomic Data. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D.P. (eds.) ICANN 2007, Part II. LNCS, vol. 4669, pp. 100–109. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  8. 8.
    Robinson, M., Castellano, C.G., Rezwan, F., Adams, R., Davey, N., Rust, A., Sun, Y.: Combining experts in order to identify binding sites in yeast and mouse genomic data. Neural Networks 21(6), 856–861 (2008)CrossRefGoogle Scholar
  9. 9.
    Tompa, M., Li, N., Bailey, T.L., Church, G.M., De Moor, B., Eskin, E., Favorov, A.V., Frith, M.C., Fu, Y., Kent, W.J., Makeev, V.J., Mironov, A.A., Noble, W.S., Pavesi, G., Pesole, G., Régnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23(1), 137–144 (2005)CrossRefGoogle Scholar
  10. 10.
    Brown, C.T.: Computational approaches to finding and analyzing cis-regulatory elements. Methods Cell Biol. 87, 337–365 (2008)CrossRefGoogle Scholar
  11. 11.
    Stormo, G.D.: DNA binding sites: representation and discovery. Bioinformatics 16, 16–23 (2000)CrossRefGoogle Scholar
  12. 12.
    Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20 (1995)Google Scholar
  13. 13.
    Zhu, J., Zhang, M.Q.: SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics 15, 607–611 (1999)CrossRefGoogle Scholar
  14. 14.
    Blanco, E., Farré, D., Albà, M.M., Messeguer, X., Guigó, R.: ABS: a database of Annotated regulatory Binding Sites from orthologous promoters. Nucleic Acids Res. 34(Database issue), D63–D67 (2006)CrossRefGoogle Scholar
  15. 15.
    Montgomery, S.B., Griffith, O.L., Sleumer, M.C., Bergman, C.M., Bilenky, M., Pleasance, E.D., Prychyna, Y., Zhang, X., Jones, S.J.M.: ORegAnno: An open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation. Bioinformatics (March 2006)Google Scholar
  16. 16.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. (JAIR) 16, 321–357 (2000)Google Scholar
  17. 17.
    Radivojac, P., Chawla, N.V., Dunker, A.K., Obradovic, Z.: Classification and knowledge discovery in protein databases. J. Biomed. Inform. 37, 224–239 (2004)CrossRefGoogle Scholar
  18. 18.
    Rezwan, F., Sun, Y., Davey, N., Adams, R., Rust, A.G., Robinson, M.: Effect of Using Varying Negative Examples in Transcription Factor Binding Site Predictions. In: Giacobini, M. (ed.) EvoBIO 2011. LNCS, vol. 6623, pp. 1–12. Springer, Heidelberg (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Faisal Rezwan
    • 1
  • Yi Sun
    • 1
  • Neil Davey
    • 1
  • Rod Adams
    • 1
  • Alistair G. Rust
    • 2
  • Mark Robinson
    • 3
  1. 1.School of Computer ScienceUniversity of HertfordshireHatfieldUK
  2. 2.Wellcome Trust Sanger InstituteHinxton, CambridgeUK
  3. 3.Benaroya Research Institute at Virginia MasonUSA

Personalised recommendations