Theoretical Chemistry Accounts

, Volume 125, Issue 3–6, pp 583–591 | Cite as

PromoterSweep: a tool for identification of transcription factor binding sites

  • Coral del Val
  • Oliver Pelz
  • Karl-Heinz Glatting
  • Endre Barta
  • Agnes Hotz-Wagenblatt
Regular Article

Abstract

There are many tools available for the prediction of potential promoter regions and the transcription factor binding sites (TFBS) harboured by them. Unfortunately, these tools cannot really avoid the prediction of vast amounts of false positives, the greatest problem in promoter analysis. The combination of different methods and algorithms has shown an improvement in prediction accuracy for similar biological problems such as gene prediction. The web-tool presented here uses this approach to perform an exhaustive integrative analysis, identification and annotation of potential promoter regions. The combination of methods employed includes searches in different experimental promoter databases to identify promoter regions and their orthologs, use of TFBS databases and search tools, and a phylogenetic footprinting strategy, combining multiple alignment of genomic sequences together with motif discovery tools that were tested previously in order to get the best method combination. The pipeline is available for academic users at the HUSAR open server http://genius.embnet.dkfz-heidelberg.de/menu/biounit/open-husar/. It integrates all of this information and identifies among the huge number of TFBS predictions those, which are more likely to be potentially functional.

Keywords

Promoter Transcription factor Motif discovery Annotation 

Abbreviations

TFBS

Transcription factor binding site

TSS

Transcriptional start site

ID

Identifier

TP

True positive

TN

True negative

FP

False positive

FN

False negative

SN

Sensitivity

SP

Specificity

CC

Correlation coefficient

XML

Extensible markup language

References

  1. 1.
    Bajic VB, Brent MR, Brown RH, Frankish A, Harrow J, Ohler U, Solovyev VV, Tan SL (2006) Genome Biol 7(Suppl 1):S3.1–S313CrossRefGoogle Scholar
  2. 2.
    Sonnenburg S, Zien A, Rätsch G (2006) Bioinformatics 22:e472–e480CrossRefGoogle Scholar
  3. 3.
    Wang X, Bandyopadhyay S, Xuan Z, Zhao X, Zhang MQ, Zhang X (2007) Comput Syst Bioinformatics Conf 6:183–193CrossRefGoogle Scholar
  4. 4.
    Xie X, Wu S, Lam KM, Yan H (2006) Bioinformatics 22:2722–2728CrossRefGoogle Scholar
  5. 5.
    Pedersen AG, Baldi P, Chauvin Y, Brunak S (1999) Comput Chem 23:191–207CrossRefGoogle Scholar
  6. 6.
    Smale ST, Kadonaga JT (2003) Annu Rev Biochem 72:449–479CrossRefGoogle Scholar
  7. 7.
    Choi CH, Kalosakas G, Rasmussen KO, Hiromura M, Bishop AR, Usheva A (2004) Nucleic Acids Res 32:1584–1590CrossRefGoogle Scholar
  8. 8.
    Schmid CD, Perier R, Praz V, Bucher P (2006) Nucleic Acids Res 34(database issue):D82–D85CrossRefGoogle Scholar
  9. 9.
    Yamashita R, Suzuki Y, Wakaguri H, Tsuritani K, Nakai K, Sugano S (2006) Nucleic Acids Res 34(database issue):D86–D89CrossRefGoogle Scholar
  10. 10.
    Sun H, Palaniswamy SK, Pohar TT, Jin VX, Huang TH, Davuluri RV (2006) Nucleic Acids Res 34(database issue):D98–D103CrossRefGoogle Scholar
  11. 11.
    Barta E, Sebestyén E, Pálfy TB, Tóth G, Ortutay CP, Patthy L (2005) Nucleic Acids Res 33(database issue):D86–D90CrossRefGoogle Scholar
  12. 12.
    Robertson G, Bilenky M, Lin K, He A, Yuen W, Dagpinar M, Varhol R, Teague K, Griffith OL, Zhang X, Pan Y, Hassel M, Sleumer MC, Pan W, Pleasance ED, Chuang M, Hao H, Li YY, Robertson N, Fjell C, Li B, Montgomery SB, Astakhova T, Zhou J, Sander J, Siddiqui AS, Jones SJ (2006) Nucleic Acids Res 34(database issue):D68–D73CrossRefGoogle Scholar
  13. 13.
    Altschul SF, Madden TL, Schaeffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Nucleic Acids Res 25:3389–3402CrossRefGoogle Scholar
  14. 14.
    Hubbard TJP, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Clarke L, Coates G, Fairley S, Fitzgerald S, Fernandez-Banet J, Gordon L, Gräf S, Haider S, Hammond M, Holland R, Howe K, Jenkinson A, Johnson N, Kähäri A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Kulesha E, Lawson D, Longden I, Megy K, Meidl P, Overduin B, Parker A, Pritchard B, Rios D, Schuster M, Slater G, Smedley D, Spooner W, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wilder S, Zadissa A, Birney E, Cunningham F, Curwen V, Durbin R, Fernandez-Suarez XM, Herrero J, Kasprzyk A, Proctor G, Smith J, Searle S, Flicek P (2009) Nucl Acids Res 37:D690–D697Google Scholar
  15. 15.
    Vilella AJ, Severin J, Ureta-Vidal A, Durbin R, Heng L, Birney E (2009) Genome Res 19:327–335CrossRefGoogle Scholar
  16. 16.
    Lenhard B, Wasserman WW (2002) Bioinformatics 18:1135–1136CrossRefGoogle Scholar
  17. 17.
    Sandelin A, Alkema W, Engström P, Wasserman WW, Lenhard B (2004) Nucleic Acids Res 32(database issue):D91–D94CrossRefGoogle Scholar
  18. 18.
    Bailey TL, Elkan C (1995) Proc Int Conf Intell Syst Mol Biol 3:21–29Google Scholar
  19. 19.
    Thompson W, Palumbo MJ, Wasserman WW, Liu JS, Lawrence CE (2004) Genome Res 14:1967–1974CrossRefGoogle Scholar
  20. 20.
    Pavesi G, Zambelli F, Pesole G (2007) BMC Bioinformatics 8:46CrossRefGoogle Scholar
  21. 21.
    Workman CT, Stormo GD (2000) Pac Symp Biocomput 2000:464–478Google Scholar
  22. 22.
    Stormo GD, Hartzell GW (1989) Proc Natl Acad Sci USA 86:1183–1187CrossRefGoogle Scholar
  23. 23.
    Morgenstern B (1999) Bioinformatics 15:211–218CrossRefGoogle Scholar
  24. 24.
    Sinha S, Tompa M (2003) Nucleic Acids Res 31:3586–3588CrossRefGoogle Scholar
  25. 25.
    Favorov AV, Gelfand MS, Gerasimova AV, Ravcheev DA, Mironov AA, Makeev VJ (2005) Bioinformatics 21:2240–2245CrossRefGoogle Scholar
  26. 26.
    Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, Makeev VJ, Mironov AA, Noble WS, Pavesi G, Pesole G, Régnier M, Simonis N, Sinha S, Thijs G, van Helden J, Vandenbogaert M, Weng Z, Workman C, Ye C, Zhu Z (2005) Nat Biotechnol 23:137–144CrossRefGoogle Scholar
  27. 27.
    Endre B (2007) Methods Mol Biol 395:319–328Google Scholar
  28. 28.
    Ernst P, Glatting KH, Suhai S (2003) Bioinformatics 19:278–282CrossRefGoogle Scholar
  29. 29.
    Senger M, Flores T, Glatting KH, Ernst P, Hotz-Wagenblatt A, Suhai S (1998) Bioinformatics 14:452–457CrossRefGoogle Scholar
  30. 30.
    Kielbasa SM, Gonze D, Herzel HP (2005) BMC Bioinformatics 6:237CrossRefGoogle Scholar
  31. 31.
    Li X, Zhong S, Wong WH (2005) PNAS 102:16945–16950CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  • Coral del Val
    • 1
    • 2
  • Oliver Pelz
    • 1
  • Karl-Heinz Glatting
    • 1
  • Endre Barta
    • 1
    • 3
    • 4
  • Agnes Hotz-Wagenblatt
    • 1
  1. 1.Molecular BiophysicsGerman Cancer Research Center (DKFZ)HeidelbergGermany
  2. 2.Computer Science and Artificial Intelligence, Informatics FacultyUniversity of GranadaGranadaSpain
  3. 3.Agricultural Biotechnology CenterGödöllőHungary
  4. 4.Apoptosis and Genomics Research Group of the Hungarian Academy of Sciences, Research Center for Molecular Medicine, Medical and Health Science CenterUniversity of DebrecenDebrecenHungary

Personalised recommendations