Using Varying Negative Examples to Improve Computational Predictions of Transcription Factor Binding Sites

Rezwan, Faisal; Sun, Yi; Davey, Neil; Adams, Rod; Rust, Alistair G.; Robinson, Mark

doi:10.1007/978-3-642-32909-8_24

Using Varying Negative Examples to Improve Computational Predictions of Transcription Factor Binding Sites

Faisal Rezwan⁴,
Yi Sun⁴,
Neil Davey⁴,
Rod Adams⁴,
Alistair G. Rust⁵ &
…
Mark Robinson⁶

Conference paper

1553 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 311))

Abstract

The identification of transcription factor binding sites (TFBSs ) is a non-trivial problem as the existing computational predictors produce a lot of false predictions. Though it is proven that combining these predictions with a meta-classifier, like Support Vector Machines (SVMs), can improve the overall results, this improvement is not as significant as expected. The reason for this is that the predictors are not reliable for the negative examples from non-binding sites in the promoter region. Therefore, using negative examples from different sources during training an SVM can be one of the solutions to this problem. In this study, we used different types of negative examples during training the classifier. These negative examples can be far away from the promoter regions or produced by randomisation or from the intronic region of genes. By using these negative examples during training, we observed their effect in improving predictions of TFBSs in the yeast. We also used a modified cross-validation method for this type of problem. Thus we observed substantial improvement in the classifier performance that could constitute a model for predicting TFBSs. Therefore, the major contribution of the analysis is that for the yeast genome, the position of binding sites could be predicted with high confidence using our technique and the predictions are of much higher quality than the predictions of the original prediction algorithms.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Tompa, M., Li, N., Bailey, T.L., Church, G.M., De Moor, B., Eskin, E., Favorov, A.V., Frith, M.C., Fu, Y., Kent, W.J., Makeev, V.J., Mironov, A.A., Noble, W.S., Pavesi, G., Pesole, G., Régnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23(1), 137–144 (2005)
Article Google Scholar
Elnitski, L., Jin, V.X., Farnham, P.J., Jones, S.J.: Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. Genome Res. 16, 1455–1464 (2006)
Article Google Scholar
Pavesi, G., Mauri, G., Pesole, G.: In silico representation and discovery of transcription factor binding sites. Brief. Bioinformatics 5, 217–236 (2004)
Article Google Scholar
Hu, J., Li, B., Kihara, D.: Limitations and potentials of current motif discovery algorithms. Nucleic Acids Res. 33, 4899–4913 (2005)
Article Google Scholar
Brown, C.T.: Computational approaches to finding and analyzing cis-regulatory elements. Methods Cell Biol. 87, 337–365 (2008)
Article Google Scholar
Sun, Y., Robinson, M., Adams, R., Rust, A.G., Davey, N.: Using Pre and Posting-processing Methods to Improve Binding Site Predictions. Pattern Recognition 42(9), 1949–1958 (2009)
Article MATH Google Scholar
Robinson, M., Castellano, C.G., Rezwan, F., Adams, R., Davey, N., Rust, A.G., Sun, Y.: Combining experts in order to identify binding sites in yeast and mouse genomic data. Neural Networks 21(6), 856–861 (2008)
Article MATH Google Scholar
Cherry, J.M., Hong, E.L., Amundsen, C., Balakrishnan, R., Binkley, G., Chan, E.T., Christie, K.R., Costanzo, M.C., Dwight, S.S., Engel, S.R., Fisk, D.G., Hirschman, J.E., Hitz, B.C., Karra, K., Krieger, C.J., Miyasato, S.R., Nash, R.S., Park, J., Skrzypek, M.S., Simison, M., Weng, S., Wong, E.D.: Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 40(Database issue), D700–D705 (2012)
Google Scholar
Montgomery, S.B., Griffith, O.L., Sleumer, M.C., Bergman, C.M., Bilenky, M., Pleasance, E.D., Prychyna, Y., Zhang, X., Jones, S.J.M.: ORegAnno: An open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation. Bioinformatics (March 2006)
Google Scholar
MacIsaac, K.D., Wang, T., Gordon, D.B., Gifford, D.K., Stormo, G., Fraenkel, E.: An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics 7, 113 (2006)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeye, W.P.: SMOTE: Synthetic minority over-sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
MATH Google Scholar
Rezwan, F., Sun, Y., Davey, N., Adams, R., Rust, A.G., Robinson, M.: Effect of Using Varying Negative Examples in Transcription Factor Binding Site Predictions. In: Pizzuti, C., Ritchie, M.D., Giacobini, M. (eds.) EvoBIO 2011. LNCS, vol. 6623, pp. 1–12. Springer, Heidelberg (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Hertfordshire, College Lane, Hatfield, Hertfordshire, AL10 9AB, UK
Faisal Rezwan, Yi Sun, Neil Davey & Rod Adams
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
Alistair G. Rust
Benaroya Research Institute at Virginia Mason, 1201 9th Avenue, Seattle, WA, 98101, USA
Mark Robinson

Authors

Faisal Rezwan
View author publications
You can also search for this author in PubMed Google Scholar
Yi Sun
View author publications
You can also search for this author in PubMed Google Scholar
Neil Davey
View author publications
You can also search for this author in PubMed Google Scholar
Rod Adams
View author publications
You can also search for this author in PubMed Google Scholar
Alistair G. Rust
View author publications
You can also search for this author in PubMed Google Scholar
Mark Robinson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Coventry University, Priory Street,, CV1 5FB, Coventry, UK
Chrisina Jayne
University of Lincoln, LN6 7TS, Lincoln, UK
Shigang Yue
University of Thrace, 193 Pandazidou st., 68200 N, Orestiada, Greece
Lazaros Iliadis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rezwan, F., Sun, Y., Davey, N., Adams, R., Rust, A.G., Robinson, M. (2012). Using Varying Negative Examples to Improve Computational Predictions of Transcription Factor Binding Sites. In: Jayne, C., Yue, S., Iliadis, L. (eds) Engineering Applications of Neural Networks. EANN 2012. Communications in Computer and Information Science, vol 311. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32909-8_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-32909-8_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32908-1
Online ISBN: 978-3-642-32909-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics