Novel Approach to Predict Promoter Region Based on Short Range Interaction Between DNA Sequences

  • Arul Mugilan
  • Abraham Nartey
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 236)


Genomic studies have become one of the useful aspects of Bioinformatics since it provides important information about an organism’s genome once it has been sequenced. Gene finding and promoter predictions are common strategies used in modern Bioinformatics which helps in the provision of an organism’s genomic information. Many works has been carried out on promoter prediction by various scientists and therefore many prediction tools are available. However, there is a high demand for novel prediction tools due to low level of prediction accuracy and sensitivity which are the important features of a good prediction tool. In this paper, we have developed the new algorithm Novel Approach to Promoter Prediction (NAPPR) to predict eukaryotic promoter region using the python programming, which can meet today’s demand to some extent. We have developed the parameters for Singlet (4\(^{1}\)) to nanoplets (4\(^{9}\)) in analyzing short range interactions between the four nucleotide bases in DNA sequences. Using this parameters NAPPR tool was developed to predict promoters with high level of Accuracy, Sensitivity and Specificity after comparing it with other known prediction tools. An Accuracy of 74 % and Specificity of 78 % was achieved after testing it on test sequences from the EPD database. The length of DNA sequence used as input has no limit and can therefore be used to predict promoters even in the whole human genome. At the end, it was found out that NAPPR can predict eukaryotic promoter with high level of accuracy and sensitivity.


Positional score value Nanoplets Short-range-interactions Expected promoter region 



Glory be to God almighty for making it possible for us to come out with this kind of project. Much appreciation is also rendered to Karunya University for the support and provision of facilities toward this research project.


  1. 1.
    Smale, S., Kadonaga, J.T.: The RNA polymerase II core promoter. Ann. Rev. Biochem. 72, 449–479 (2003)CrossRefGoogle Scholar
  2. 2.
    Elgar, G., Vavouri, T.: Tuning in to the signals, non-coding sequence conservation in vertebrate genomes. Trends Ganet. 24(7), 344–352 (2008)CrossRefGoogle Scholar
  3. 3.
  4. 4.
    Lander, E.S.: The new genomics, global views of biology. Science 3, 536–539 (1996)CrossRefGoogle Scholar
  5. 5.
    Lander, E.S.: Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001)CrossRefGoogle Scholar
  6. 6.
    Azad, A.K.M., Saima, S., Nasimul, N., Hyunju, L.: Prediction of plant promoters based on hexamers and random triplet pair analysis. Algorithms Mol. Biol. 6, 19 (2011)Google Scholar
  7. 7.
    Kornev, A.P., Taylor, S.S., Ten, E.L.F.: A helix scaffold for the assembly of active protein kinases. Proc. Natl. Acad. Sci. 105(38), 14377–14382 (2008)CrossRefGoogle Scholar
  8. 8.
    Ten, E.L.F., Taylor, S.S., Kornev, A.P.: Conserved spatial patterns across the protein kinase family. Biochim. Biophys. Acta 1784(1), 238–243 (2008)CrossRefGoogle Scholar
  9. 9.
    Shuqin, W., Yan, W., Wei, D., Fangxun, S., Xiumei, W., Yanchun, L., Chunguang, Z.: Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms ICANNGA ’07, Pp. 296–305 (2007)Google Scholar
  10. 10.
    Resse, M.G.: Application of a time-dalay neural network to promoter annotation in the Drosophila melanogaster genome. Comput. Chem. 26, 51–56 (2001)CrossRefGoogle Scholar
  11. 11.
    Prestridge, D.S.: Predicting Pol II promoter sequences using transcription factor binding sites. J. Mol. Biol. 249, 923–932 (1995)CrossRefGoogle Scholar
  12. 12.
    Christoph, D., Schmid, Viviane, P., Mauro, D., Rouaïda, P., Philipp, B.: Nucl. Acids Res. 32 (suppl 1), D82–D85. (2004). doi: 10.1093/nar/gkh122
  13. 13.
    Arul, M.S.: Sequence, structure and conformational analysis of protein databases. J. Adv. Bioinform. Appl. Res. 2, 183–192 (2011)Google Scholar
  14. 14.
    Mugilan, S.A., Veluraja, K.: Generation of deviation parameters for amino acid singlets, doublets and triplets from three-dimensional structures of proteins and its implications in secondary structure prediction from amino acid sequences. J. Bioscience. 5, 81–91 (2000)CrossRefGoogle Scholar
  15. 15.
    Doherty, K., Adams, R., Davey, N.: Non-Euclidean norms and data normalization. Verleysen. 6, 181–186 (2004)Google Scholar
  16. 16.
    Óscar, B., Santiago, B.: CNN-PROMOTER, new consensus promoter prediction program based on neural networks. Revista EIA 15, 153–164 (2011)Google Scholar
  17. 17.
    Callahan, J.L., Andrews, K.J., Zakian, V.A., Freudenreich, C.H.: Mutations in yeast replication proteins that increase CAG/CTG expansions also increase repeat fragility. Mol. Cell. Biol. 23(21), 7849–7860 (2003)CrossRefGoogle Scholar
  18. 18.
    Wang, G., Vasquez, K.M.: Models for chromosomal replication-independent non-B DNA structure-induced genetic instability. Mol. Carcinog. 48(4), 286–298 (2009)CrossRefGoogle Scholar
  19. 19.
    Kiran, J.A., Veeraraghavulu, P.C., Yellapu, N.K., Somesula, S.R., Srinivasan, S.K., Matcha, B.: Comparison and correlation of simple sequence repeats. Bioinformation 6(5), 179–182 (2011)CrossRefGoogle Scholar
  20. 20.
    Gardiner, G.M., Frommer, M.: CpG islands in vertebrate genomes. J. Mol. Biol. 196, 261–287 (1987)CrossRefGoogle Scholar
  21. 21.
    Ioshikhes, I.P., Zhang, M.Q.: Large-scale human promoter mapping using CpG islands. Nat. Benet. 26, 61–63 (2000)Google Scholar

Copyright information

© Springer India 2014

Authors and Affiliations

  1. 1.Department of BioinformaticsSchool of Health Science and Biotechnology, Karunya UniversityCoimbatoreIndia
  2. 2.Department of Theoretical and Applied BiologyKwame Nkrumah University of Science and Technology, College of ScienceKumasiGhana

Personalised recommendations