In silico Identification of Eukaryotic Promoters

  • Venkata Rajesh Yella
  • Manju Bansal


The identification of promoters is essential for complete annotation of genomes and better understanding of gene regulatory networks. Experimental methods for promoter identification are costly, time-consuming and labor intensive. Hence, in silico methods are an attractive alternative. Computational methods for promoter prediction methods are easy, fast and can provide reliable results. A promoter prediction algorithm identifies promoter regions based on the idea that, promoter regions are different from other genomic regions in their features (sequence, context and structure). Promoter prediction algorithms are broadly classified as ab initio, hybrid and homology-based, depending on the information used for model design. The different approaches used in promoter prediction are briefly described here.


Promoter prediction programs FirstEF CpGProD Eponine PromoterInspector PromPredict EP3 PromH 



MB is a recipient of the J. C. Bose National Fellowship of DST, India. We thank Rajasekaran for assistance in the preparation of Fig. 4.1.


  1. Abeel T, Saeys Y, Bonnet E, Rouze P, Van de Peer Y (2008a) Generic eukaryotic core promoter prediction using structural features of DNA. Genome Res 18(2):310–323Google Scholar
  2. Abeel T, Saeys Y, Rouze P, Van de Peer Y (2008b) ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles. Bioinformatics 24(13):24–31Google Scholar
  3. Abeel T, Van de Peer Y, Saeys Y (2009) Toward a gold standard for promoter prediction evaluation. Bioinformatics 25(12):i313–i320PubMedCentralPubMedCrossRefGoogle Scholar
  4. Audic S, Claverie JM (1997) Detection of eukaryotic promoters using Markov transition matrices. Comput Chem 21(4):223–227PubMedCrossRefGoogle Scholar
  5. Bajic VB, Seah SH (2003) Dragon gene start finder: an advanced system for finding approximate locations of the start of gene transcriptional units. Genome Res 13(8):1923–1929PubMedCentralPubMedGoogle Scholar
  6. Bajic VB, Seah SH, Chong A, Zhang G, Koh JL, Brusic V (2002) Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters. Bioinformatics 18(1):198–199PubMedCrossRefGoogle Scholar
  7. Bajic VB, Tan SL, Suzuki Y, Sugano S (2004) Promoter prediction analysis on the whole human genome. Nat Biotechnol 22(11):1467–1473PubMedCrossRefGoogle Scholar
  8. Bajic VB, Brent MR, Brown RH, Frankish A, Harrow J, Ohler U, Solovyev VV, Tan SL (2006) Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment. Genome Biol 7(Suppl 1):1–13CrossRefGoogle Scholar
  9. Bucher P (1990) Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J Mol Biol 212(4):563–578PubMedCrossRefGoogle Scholar
  10. Carninci P, Sandelin A, Lenhard B, Katayama S, Shimokawa K, Ponjavic J, Semple CA, Taylor MS, Engstrom PG, Frith MC, Forrest AR, Alkema WB, Tan SL, Plessy C, Kodzius R, Ravasi T, Kasukawa T, Fukuda S, Kanamori-Katayama M, Kitazume Y, Kawaji H, Kai C, Nakamura M, Konno H, Nakano K, Mottagui-Tabar S, Arner P, Chesi A, Gustincich S, Persichetti F, Suzuki H, Grimmond SM, Wells CA, Orlando V, Wahlestedt C, Liu ET, Harbers M, Kawai J, Bajic VB, Hume DA, Hayashizaki Y (2006) Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet 38(6):626–635PubMedCrossRefGoogle Scholar
  11. Davuluri RV, Grosse I, Zhang MQ (2001) Computational identification of promoters and first exons in the human genome. Nat Genet 29(4):412–417PubMedCrossRefGoogle Scholar
  12. Down TA, Hubbard TJ (2002) Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res 12(3):458–461PubMedCentralPubMedCrossRefGoogle Scholar
  13. Fickett JW, Hatzigeorgiou AG (1997) Eukaryotic promoter recognition. Genome Res 7(9):861–878PubMedGoogle Scholar
  14. Fickett JW, Wasserman WW (2000) Discovery and modeling of transcriptional regulatory regions. Curr Opin Biotechnol 11(1):19–24PubMedCrossRefGoogle Scholar
  15. Gangal R, Sharma P (2005) Human pol II promoter prediction: time series descriptors and machine learning. Nucleic Acids Res 33(4):1332–1336PubMedCentralPubMedCrossRefGoogle Scholar
  16. Goni JR, Perez A, Torrents D, Orozco M (2007) Determining promoter location based on DNA structure first-principles calculations. Genome Biol 8(12):R263CrossRefGoogle Scholar
  17. Gupta R, Wikramasinghe P, Bhattacharyya A, Perez FA, Pal S, Davuluri RV (2010) Annotation of gene promoters by integrative data-mining of ChIP-seq Pol-II enrichment data. BMC Bioinformatics 11Suppl 1:S65CrossRefGoogle Scholar
  18. Hutchinson GB (1996) The prediction of vertebrate promoter regions using differential hexamer frequency analysis. Comput Appl Biosci 12(5):391–398PubMedGoogle Scholar
  19. Ioshikhes IP, Zhang MQ (2000) Large-scale human promoter mapping using CpG islands. Nat Genet 26(1):61–63PubMedCrossRefGoogle Scholar
  20. Juven-Gershon T, Hsu JY, Theisen JW, Kadonaga JT (2008) The RNA polymerase II core promoter—the gateway to transcription. Curr Opin Cell Biol 20(3):253–259PubMedCentralPubMedCrossRefGoogle Scholar
  21. Knudsen S (1999) Promoter2.0: for the recognition of PolII promoter sequences. Bioinformatics 15(5):356–361PubMedCrossRefGoogle Scholar
  22. Lenhard B, Sandelin A, Carninci P (2012) Metazoan promoters: emerging characteristics and insights into transcriptional regulation. Nat Rev Genet 13(4):233–245PubMedGoogle Scholar
  23. Levitsky VG, Katokhin AV (2003) Recognition of eukaryotic promoters using a genetic algorithm based on iterative discriminant analysis. In Silico Biol 3(1-2):81–87PubMedGoogle Scholar
  24. Li X, Zeng J, Yan H (2008) PCA-HPR: a principle component analysis model for human promoter recognition. Bioinformation 2(9):373–378PubMedCentralPubMedCrossRefGoogle Scholar
  25. Morey C, Mookherjee S, Rajasekaran G, Bansal M (2011) DNA free energy-based promoter prediction and comparative analysis of Arabidopsis and rice genomes. Plant Physiol 156(3):1300–1315PubMedCentralPubMedCrossRefGoogle Scholar
  26. Ohler U (2000) Promoter prediction on a genomic scale—the Adh experience. Genome Res 10(4):539–542PubMedCentralPubMedCrossRefGoogle Scholar
  27. Ohler U, Niemann H (2001) Identification and analysis of eukaryotic promoters: recent computational approaches. Trends Genet 17(2):56–60PubMedCrossRefGoogle Scholar
  28. Ohler U, Liao GC, Niemann H, Rubin GM (2002) Computational analysis of core promoters in the Drosophila genome. Genome Biol 3(12):RESEARCH0087CrossRefGoogle Scholar
  29. Pedersen AG, Baldi P, Chauvin Y, Brunak S (1998) DNA structure in human RNA polymerase II promoters. J Mol Biol 281(4):663–673PubMedCrossRefGoogle Scholar
  30. Pedersen AG, Baldi P, Chauvin Y, Brunak S (1999) The biology of eukaryotic promoter prediction—a review. Comput Chem 23(3–4):191–207PubMedCrossRefGoogle Scholar
  31. Ponger L, Mouchiroud D (2002) CpGProD: identifying CpG islands associated with transcription start sites in large genomic mammalian sequences. Bioinformatics 18(4):631–633PubMedCrossRefGoogle Scholar
  32. Prestridge DS (1995) Predicting Pol II promoter sequences using transcription factor binding sites. J Mol Biol 249(5):923–932PubMedCrossRefGoogle Scholar
  33. Rangannan V, Bansal M (2010) High-quality annotation of promoter regions for 913 bacterial genomes. Bioinformatics 26(24):3043–3050PubMedCrossRefGoogle Scholar
  34. Reese MG (2001) Application of a time-delay neural network to promoter annotation in the Drosophila melanogaster genome. Comput Chem 26(1):51–56PubMedCrossRefGoogle Scholar
  35. Sandelin A, Carninci P, Lenhard B, Ponjavic J, Hayashizaki Y, Hume DA (2007) Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nat Rev Genet 8(6):424–436PubMedCrossRefGoogle Scholar
  36. SantaLucia J (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc Natl Acad Sci USA 95(4):1460–1465PubMedCentralPubMedCrossRefGoogle Scholar
  37. Scherf M, Klingenhoff A, Werner T (2000) Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach. J Mol Biol 297(3):599–606PubMedCrossRefGoogle Scholar
  38. Schmid CD, Praz V, Delorenzi M, Perier R, Bucher P (2004) The Eukaryotic Promoter Database EPD: the impact of in silico primer extension. Nucleic Acids Res 32(Database issue):D82–D85PubMedCentralPubMedCrossRefGoogle Scholar
  39. Solovyev VV, Shahmuradov IA (2003) PromH: Promoters identification using orthologous genomic sequences. Nucleic Acids Res 31(13):3540–3545PubMedCentralPubMedCrossRefGoogle Scholar
  40. Sonnenburg S, Zien A, Ratsch G (2006) ARTS: accurate recognition of transcription starts in human. Bioinformatics 22(14):e472–e480PubMedCrossRefGoogle Scholar
  41. Suzuki Y, Yamashita R, Nakai K, Sugano S (2002) DBTSS: dataBase of human Transcriptional Start Sites and full-length cDNAs. Nucleic Acids Res 30(1):328–331PubMedCentralPubMedCrossRefGoogle Scholar
  42. Thomas MC, Chiang CM (2006) The general transcription machinery and general cofactors. Crit Rev Biochem Mol Biol 41(3):105–178PubMedCrossRefGoogle Scholar
  43. Valen E, Sandelin A (2011) Genomic and chromatin signals underlying transcription start-site selection. Trends Genet 27(11):475–485PubMedCrossRefGoogle Scholar
  44. Wang J, Ungar LH, Tseng H, Hannenhalli S (2007) MetaProm: a neural network based meta-predictor for alternative human promoter prediction. BMC Genomics 8:374CrossRefGoogle Scholar
  45. Wang J, Ma C, Zhou D, Zhang L, Zhou Y (2012) Accurately predicting transcription start sites using logitlinear model and local oligonucleotide frequencies. In: Bio-Inspired Computing and Applications, pp 107–114Google Scholar
  46. Wang X, Xuan Z, Zhao X, Li Y, Zhang MQ (2009) High-resolution human core-promoter prediction with CoreBoost\HM. Genome Res 19(2):266–275PubMedCentralPubMedCrossRefGoogle Scholar
  47. Wingender E, Chen X, Hehl R, Karas H, Liebich I, Matys V, Meinhardt T, Pruss M, Reuter I, Schacherer F (2000) TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res 28(1):316–319PubMedCentralPubMedCrossRefGoogle Scholar
  48. Xie X, Wu S, Lam KM, Yan H (2006) PromoterExplorer: an effective promoter identification method based on the AdaBoost algorithm. Bioinformatics 22(22):2722–2728PubMedCrossRefGoogle Scholar
  49. Xu Z, Wei W, Gagneur J, Perocchi F, Clauder-Munster S, Camblong J, Guffanti E, Stutz F, Huber W, Steinmetz LM (2009) Bidirectional promoters generate pervasive transcription in yeast. Nature 457(7232):1033–1037PubMedCentralPubMedCrossRefGoogle Scholar
  50. Zeng J, Zhu S, Yan H (2009) Towards accurate human promoter recognition: a review of currently used sequence features and classification methods. Brief Bioinformatics 10(5):498–508PubMedCrossRefGoogle Scholar
  51. Zeng J, Zhao XY, Cao XQ, Yan H (2010) SCS: signal, context, and structure features for genome-wide human promoter recognition. IEEE/ACM Trans Comput Biol Bioinform 7(3):550–562PubMedCrossRefGoogle Scholar
  52. Zhang MQ (2011) Computational promoter prediction in a vertebrate genome. In: Handbook of Statistical Bioinformatics, pp 73–85Google Scholar
  53. Zhao X, Xuan Z, Zhang MQ (2007) Boosting with stumps for predicting transcription start sites. Genome Biol 8(2):R17CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2015

Authors and Affiliations

  1. 1.Molecular Biophysics UnitIndian Institute of ScienceBengaluruIndia

Personalised recommendations