Abstract
There are many tools available for the prediction of potential promoter regions and the transcription factor binding sites (TFBS) harboured by them. Unfortunately, these tools cannot really avoid the prediction of vast amounts of false positives, the greatest problem in promoter analysis. The combination of different methods and algorithms has shown an improvement in prediction accuracy for similar biological problems such as gene prediction. The web-tool presented here uses this approach to perform an exhaustive integrative analysis, identification and annotation of potential promoter regions. The combination of methods employed includes searches in different experimental promoter databases to identify promoter regions and their orthologs, use of TFBS databases and search tools, and a phylogenetic footprinting strategy, combining multiple alignment of genomic sequences together with motif discovery tools that were tested previously in order to get the best method combination. The pipeline is available for academic users at the HUSAR open server http://genius.embnet.dkfz-heidelberg.de/menu/biounit/open-husar/. It integrates all of this information and identifies among the huge number of TFBS predictions those, which are more likely to be potentially functional.
Similar content being viewed by others
Abbreviations
- TFBS:
-
Transcription factor binding site
- TSS:
-
Transcriptional start site
- ID:
-
Identifier
- TP:
-
True positive
- TN:
-
True negative
- FP:
-
False positive
- FN:
-
False negative
- SN:
-
Sensitivity
- SP:
-
Specificity
- CC:
-
Correlation coefficient
- XML:
-
Extensible markup language
References
Bajic VB, Brent MR, Brown RH, Frankish A, Harrow J, Ohler U, Solovyev VV, Tan SL (2006) Genome Biol 7(Suppl 1):S3.1–S313
Sonnenburg S, Zien A, Rätsch G (2006) Bioinformatics 22:e472–e480
Wang X, Bandyopadhyay S, Xuan Z, Zhao X, Zhang MQ, Zhang X (2007) Comput Syst Bioinformatics Conf 6:183–193
Xie X, Wu S, Lam KM, Yan H (2006) Bioinformatics 22:2722–2728
Pedersen AG, Baldi P, Chauvin Y, Brunak S (1999) Comput Chem 23:191–207
Smale ST, Kadonaga JT (2003) Annu Rev Biochem 72:449–479
Choi CH, Kalosakas G, Rasmussen KO, Hiromura M, Bishop AR, Usheva A (2004) Nucleic Acids Res 32:1584–1590
Schmid CD, Perier R, Praz V, Bucher P (2006) Nucleic Acids Res 34(database issue):D82–D85
Yamashita R, Suzuki Y, Wakaguri H, Tsuritani K, Nakai K, Sugano S (2006) Nucleic Acids Res 34(database issue):D86–D89
Sun H, Palaniswamy SK, Pohar TT, Jin VX, Huang TH, Davuluri RV (2006) Nucleic Acids Res 34(database issue):D98–D103
Barta E, Sebestyén E, Pálfy TB, Tóth G, Ortutay CP, Patthy L (2005) Nucleic Acids Res 33(database issue):D86–D90
Robertson G, Bilenky M, Lin K, He A, Yuen W, Dagpinar M, Varhol R, Teague K, Griffith OL, Zhang X, Pan Y, Hassel M, Sleumer MC, Pan W, Pleasance ED, Chuang M, Hao H, Li YY, Robertson N, Fjell C, Li B, Montgomery SB, Astakhova T, Zhou J, Sander J, Siddiqui AS, Jones SJ (2006) Nucleic Acids Res 34(database issue):D68–D73
Altschul SF, Madden TL, Schaeffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Nucleic Acids Res 25:3389–3402
Hubbard TJP, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Clarke L, Coates G, Fairley S, Fitzgerald S, Fernandez-Banet J, Gordon L, Gräf S, Haider S, Hammond M, Holland R, Howe K, Jenkinson A, Johnson N, Kähäri A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Kulesha E, Lawson D, Longden I, Megy K, Meidl P, Overduin B, Parker A, Pritchard B, Rios D, Schuster M, Slater G, Smedley D, Spooner W, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wilder S, Zadissa A, Birney E, Cunningham F, Curwen V, Durbin R, Fernandez-Suarez XM, Herrero J, Kasprzyk A, Proctor G, Smith J, Searle S, Flicek P (2009) Nucl Acids Res 37:D690–D697
Vilella AJ, Severin J, Ureta-Vidal A, Durbin R, Heng L, Birney E (2009) Genome Res 19:327–335
Lenhard B, Wasserman WW (2002) Bioinformatics 18:1135–1136
Sandelin A, Alkema W, Engström P, Wasserman WW, Lenhard B (2004) Nucleic Acids Res 32(database issue):D91–D94
Bailey TL, Elkan C (1995) Proc Int Conf Intell Syst Mol Biol 3:21–29
Thompson W, Palumbo MJ, Wasserman WW, Liu JS, Lawrence CE (2004) Genome Res 14:1967–1974
Pavesi G, Zambelli F, Pesole G (2007) BMC Bioinformatics 8:46
Workman CT, Stormo GD (2000) Pac Symp Biocomput 2000:464–478
Stormo GD, Hartzell GW (1989) Proc Natl Acad Sci USA 86:1183–1187
Morgenstern B (1999) Bioinformatics 15:211–218
Sinha S, Tompa M (2003) Nucleic Acids Res 31:3586–3588
Favorov AV, Gelfand MS, Gerasimova AV, Ravcheev DA, Mironov AA, Makeev VJ (2005) Bioinformatics 21:2240–2245
Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, Makeev VJ, Mironov AA, Noble WS, Pavesi G, Pesole G, Régnier M, Simonis N, Sinha S, Thijs G, van Helden J, Vandenbogaert M, Weng Z, Workman C, Ye C, Zhu Z (2005) Nat Biotechnol 23:137–144
Endre B (2007) Methods Mol Biol 395:319–328
Ernst P, Glatting KH, Suhai S (2003) Bioinformatics 19:278–282
Senger M, Flores T, Glatting KH, Ernst P, Hotz-Wagenblatt A, Suhai S (1998) Bioinformatics 14:452–457
Kielbasa SM, Gonze D, Herzel HP (2005) BMC Bioinformatics 6:237
Li X, Zhong S, Wong WH (2005) PNAS 102:16945–16950
Author information
Authors and Affiliations
Corresponding author
Additional information
Dedicated to Professor Sandor Suhai on the occasion of his 65th birthday and published as part of the Suhai Festschrift Issue.
Rights and permissions
About this article
Cite this article
del Val, C., Pelz, O., Glatting, KH. et al. PromoterSweep: a tool for identification of transcription factor binding sites. Theor Chem Acc 125, 583–591 (2010). https://doi.org/10.1007/s00214-009-0643-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00214-009-0643-8