Journal of Biosciences

, Volume 37, Issue 3, pp 433–444 | Cite as

DNA-energetics-based analyses suggest additional genes in prokaryotes

Article

Abstract

We present here a novel methodology for predicting new genes in prokaryotic genomes on the basis of inherent energetics of DNA. Regions of higher thermodynamic stability were identified, which were filtered based on already known annotations to yield a set of potentially new genes. These were then processed for their compatibility with the stereo-chemical properties of proteins and tripeptide frequencies of proteins in Swissprot data, which results in a reliable set of new genes in a genome. Quite surprisingly, the methodology identifies new genes even in well-annotated genomes. Also, the methodology can handle genomes of any GC-content, size and number of annotated genes.

Keywords

DNA energetics gene prediction prokaryotes 

References

  1. Abeel T, Saeys Y, Rouzé P and de Peer YV 2008 ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles. Bioinformatics 24 i24–i31PubMedCrossRefGoogle Scholar
  2. Alexandersson M, Cawley S and Pachter L 2003 SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Res. 13 496–502Google Scholar
  3. Allen JE, Pertea M and Salzberg SL 2004 Computational Gene Prediction Using Multiple Sources of Evidence. Genome Res. 14 142–148Google Scholar
  4. Audic S and Claverie J-M 1998 Self-identification of protein-coding regions in microbial genomes. Proc. Natl. Acad. Sci. USA 95 10026–10031Google Scholar
  5. Besemer J and Borodovsky M 1999 Heuristic approach to deriving models for gene finding. Nucleic Acids Res. 27 3911–3920Google Scholar
  6. Birney E and Durbin R 2000 Using GeneWise in the Drosophila annotation experiment. Genome Res. 10 547–548Google Scholar
  7. Baren MJ van and Brent MR 2006 Iterative gene prediction and pseudogene removal improves genome annotation. Genome Res. 16 678–685Google Scholar
  8. Chatterji S and Pachter L 2006 Reference based annotation with GeneMapper. Genome Biol. 7 R29Google Scholar
  9. Claverie JM, Poirot O and Lopez F 1997 The difficulty of identifying genes in anonymous vertebrate sequences. Comput. Chem. 21 203–214PubMedCrossRefGoogle Scholar
  10. DeCaprio D, Vinson JP, Pearson MD, Montgomery P, Doherty M and Galagan JE 2007 Conrad: Gene prediction using conditional random fields. Genome Res. 17 1389–1398Google Scholar
  11. Delcourt SG and Blake RD 1991 Stacking energies in DNA. J. Biol. Chem. 266 15160–15169PubMedGoogle Scholar
  12. Dhar PK, Thwin, ST, Tun K, Tsumoto Y, Maurer-Stroh, Eisenhaber F and Surana U 2009 Synthesizing non-natural parts from natural genomic template. J. Biol. Engg. 3 2Google Scholar
  13. Dineen DG, Wilm A, Cunningham P and Higgins DG 2009 High DNA melting temperature predicts transcription start site location in human and mouse. Nucleic Acids Res. 37 7360–7367Google Scholar
  14. Dixit SB, Beveridge DL, Case DA, Cheatham 3rd TE, Giudice E, Lankas F, Lavery R, Maddocks JH, et al. 2005 Molecular dynamics simulations of the 136 unique tetranucleotide sequences of DNA oligonucleotides II: Sequence context effects on the dynamical structures of the 10 unique dinucleotide steps. Biophys. J. 89 3721–3740PubMedCrossRefGoogle Scholar
  15. Dutta S, Singhal P, Agrawal P, Tomer R, Kritee, Khurana E, et al. 2006 A physico-chemical model for analyzing DNA sequences. J. Chem. Inf. Model 46 78–85Google Scholar
  16. Fickett JW 1982 Recognition of protein coding regions in DNA sequences. Nucleic Acids Res. 10 5303–5318Google Scholar
  17. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness, EF, Kerlavage AR, et al. 1995 Whole-genome random sequencing and assembly of Haemophilus influenza Rd. Science 269 496–512PubMedCrossRefGoogle Scholar
  18. Frishman D, Mironov A, Mewes HW and Gelfand M 1998 Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res. 26 2941–2947Google Scholar
  19. Gelfand, MS, Mironov AA and Pevzner PA 1996 Gene recognition via spliced sequence alignment. Proc. Natl. Acad. Sci. 93 9061–9066PubMedCrossRefGoogle Scholar
  20. Gibson DG, Glass JI, Lartigue C, Noskov VN and Chuang R-Y 2010 Creation of a bacterial cell controlled by a chemically synthesized genome. Science 329 52–56PubMedCrossRefGoogle Scholar
  21. Glusman G, Qin S, El-Gewely MR, Siegel AF, Roach JC, Hood L, et al. 2006 Third approach to gene prediction suggests thousands of additional human transcribed regions. PLoS Comput. Biol. 2 e18Google Scholar
  22. Gross SS and Brent MR 2006 Using multiple alignments to improve gene prediction. J. Comput. Biol. 13 379–393Google Scholar
  23. Guigó R, Flicek P, Abril JF, Reymond A, Lagarde J, Denoeud F, et al. 2006 EGASP: The human ENCODE genome annotation assessment project. Genome Biol. 7S2Google Scholar
  24. Harrow J, Denoeud F, Frankish A, Reymond A, Chen C-K, Chrast J, et al. 2006 GENCODE: Producing a reference annotation for ENCODE. Genome Biol. 7S4Google Scholar
  25. Huang Y and Kowalski D 2003 WEB-THERMODYN: sequence analysis software for profiling DNA helical stability. Nucleic Acids Res. 31 3819–3821Google Scholar
  26. Hunter CA 1993 Sequence-dependent dna-structure - the role of base stacking interactions. J. Mol. Biol. 230 1025–1054PubMedCrossRefGoogle Scholar
  27. Jayaram B 1997 Beyond the wobble: the rule of conjugates. J. Mol. Evol. 45 704–705.PubMedCrossRefGoogle Scholar
  28. Jayaram B 2008 Decoding the design principles of amino acids and the chemical logic of protein sequences. Nat. Precedings (http://hdl.handle.net/10101/npre.2008.2135.1)
  29. Jayaram B and Beveridge DL 1990 Free Energy of an arbitrary charge distribution imbedded in coaxial cylindrical dielectric continua: Application to conformational preferences of DNA in aqueous solutions. J. Phys. Chem. 94 4666–4671CrossRefGoogle Scholar
  30. Jensen KT, Petersen L, Falk S, Iversen P, Andersen P, Theisen M, et al. 2006 Novel overlapping coding sequences in Chlamydia trachomatis. FEMS Microbiol Lett. 265 106–117Google Scholar
  31. Kanhere A and Bansal M 2005a Structural properties of promoters: similarities and differences between prokaryotes and eukaryotes. Nucleic Acids Res. 33 3165–3175Google Scholar
  32. Kanhere A and Bansal M 2005b A novel method for prokaryotic promoter prediction based on DNA stability. BMC Bioinformatics 6 1–10PubMedCrossRefGoogle Scholar
  33. Keller O, Kollmar M, Stanke M and Waack S 2011 A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics 27 757–763PubMedCrossRefGoogle Scholar
  34. Khandelwal G and Jayaram B 2010 A phenomenological model for predicting melting temperatures of DNA sequences. PLoS ONE 5 e12433Google Scholar
  35. Knowles DG and McLysaght A 2009 Recent de novo origin of human protein-coding genes. Genome Res. 19 1752–1759Google Scholar
  36. Korf I, Flicek P, Duan D and Brent MR 2001 Integrating genomic homology into gene structure prediction. Bioinformatics 17 S140-S148PubMedCrossRefGoogle Scholar
  37. Lavery R, Zakrzewska K, Beveridge DL, Bishop TC, Case TA, Cheatham IIIT, Dixit S, Jayaram B, et al. 2009 A systematic molecular dynamics study of nearest neighbor effects on base pair and base pair step conformations and fluctuations in B-DNA. Nucleic Acids Res. 38 299–313.Google Scholar
  38. Lin S and Kowalski D 1994 DNA helical instability facilitates initiation at the SV40 replication origin. J. Mol. Biol. 235 496–507PubMedCrossRefGoogle Scholar
  39. Maeda Y and Ohtsubo E 1987 Relationship between helix-coil transition and gene organization of ColEl plasmid DNA differential scanning calorimetric and theoretical studies. J. Mol. Biol. 194 691–698PubMedCrossRefGoogle Scholar
  40. Mathé C, Sagot M-F, Schiex T and Rouzé P 2002 Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res. 30 4103–4117Google Scholar
  41. Meyer IM and Durbin R 2004 Gene structure conservation aids similarity based gene prediction. Nucleic Acids Res. 32 776–783Google Scholar
  42. Morey C, Mookherjee S, Rajasekaran G and Bansal M 2011 DNA free energy-based promoter prediction and comparative analysis of Arabidopsis and Rice genomes. Plant Physiol. 156 1300–1315Google Scholar
  43. Natale DA, Umek RM and Kowalski D 1993 Ease of DNA unwinding is a conserved property of yeast replication origins. Nucleic Acids Res. 21 555–560Google Scholar
  44. O’Donovan C, Martin MJ, Gattiker A, Gasteiger, E, Bairoch A and Apweiler R 2002 High-quality protein knowledge resource: SWISS-PROT and TrEMBL. Brief. Bioinform. 3 275–284PubMedCrossRefGoogle Scholar
  45. Owczarzy R, Vallone PM, Goldstein RF and Benight AS 1999 Studies of DNA dumbbells VII: Evaluation of the next nearest-neighbor sequence-dependent interactions in duplex DNA. Biopolymers 52 29–56PubMedCrossRefGoogle Scholar
  46. Pagani I, Konstantinos L, Jansson J, Chen I-Min A, Smirnova T, Bahador N, et al. 2012 The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 40 D571-D579Google Scholar
  47. Panjkovich A and Melo F 2005 Comparison of different melting temperature calculation methods for short DNA sequences. Bioinformatics 21 711–722PubMedCrossRefGoogle Scholar
  48. Pati A, Ivanova NN, Mikhailova N, Ovchinnikova G, Hooper SD, Lykidis A, et al. 2010 GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes. Nat. Methods 7 455–457Google Scholar
  49. Protozanova E, Yakovchuk P and Frank-Kamenetskii MD 2004 Stacked-unstacked equilibrium at the nick site of DNA DOI:dx.doi.org . J. Mol. Biol. 342 775–785PubMedCrossRefGoogle Scholar
  50. Rangannan V and Bansal M 2007 Identification and annotation of promoter regions in microbial genome sequences on the basis of DNA stability. J. Biosci. 32 851–862PubMedCrossRefGoogle Scholar
  51. SantaLucia J Jr 1998 A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc. Natl. Acad. Sci. USA 95 1460–1465Google Scholar
  52. Shah SP, McVicker GP, Mackworth AK, Rogic S and Ouellette BFF 2003 GeneComber: combining outputs of gene prediction programs for improved results. Bioinformatics 19 1296–1297PubMedCrossRefGoogle Scholar
  53. Siepel A 2009 Darwinian alchemy: Human genes from noncoding DNA. Genome Res. 19 1693–1695Google Scholar
  54. Singhal P, Jayaram B, Dixit SB and Beveridge DL 2008 Prokaryotic gene finding based on physicochemical characteristics of codons calculated from molecular dynamics simulations. Biophys. J. 94 4173–4183PubMedCrossRefGoogle Scholar
  55. Sponer J, Leszczynski J and Hobza P 2001 Electronic properties, hydrogen bonding, stacking, and cation binding of DNA and RNA bases. Biopolymers 61 3–31PubMedCrossRefGoogle Scholar
  56. Sponer J, Jurecka P and Hobza P 2004 Accurate interaction energies of hydrogen-bonded nucleic acid base pairs. J. Am. Chem. Soc. 126 10142–10151PubMedCrossRefGoogle Scholar
  57. Stanke M, Steinkamp R, Waack S and Morgenstern B 2004 AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res. 32 W309-W312Google Scholar
  58. Stanke M, Diekhans M, Baertsch R and Haussler D 2008 Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24 637–644PubMedCrossRefGoogle Scholar
  59. Stormo GD, Schneider TD, Gold L and Ehrenfeucht A 1982 Use of the ‘Perceptron’ algorithm to distinguish translation initiation site in E. coli. Nucleic Acids Res. 10 2997–3011Google Scholar
  60. Sugimoto N, Nakano S, Yoneyama M and Honda K 1996 Improved thermodynamic parameters and helix initiation factor to predict stability of DNA duplexes. Nucleic Acids Res. 24 4501–4505Google Scholar
  61. Tech M and Meinicke P 2006 An unsupervised classification scheme for improving predictions of prokaryotic TIS. BMC Bioinformatics 7 121Google Scholar
  62. The UniProt Consortium 2011 Ongoing and future developments at the Universal Protein Resource. Nucleic Acids Res. 39 D214-D219CrossRefGoogle Scholar
  63. Umek RM and Kowalski D 1988 The ease of DNA unwinding as a determinant of initiation at yeast replication origins. Cell 52 559–567PubMedCrossRefGoogle Scholar
  64. Wada A and Suyama A 1983 Correlation between physical stability maps and genetic map of DNA double strand. J. Phys. Soc. Jpn. 52 4417–4422CrossRefGoogle Scholar
  65. Wada A and Suyama A 1984a Stability distribution in the phage g-DNA double helix: A correlation between physical and genetic structure. J. Biomol. Struct. Dyn. 2 573–591PubMedCrossRefGoogle Scholar
  66. Wada A and Suyama A 1984b Variation of double-helix stability along DNA molecular thread and its biological implications: Homostabilizing propensity of gene double-helix; in Molecular basis of cancer (ed) R Rein (New York: Alan R. Liss Inc.) pp 37–46Google Scholar
  67. Wada A and Suyama A 1985a Homogeneous double-helix-stability in individual genes; in 4th Conversation in Biomolecular Stereodynamics (ed) RH Sarma (State University of New York at Albany) p 65Google Scholar
  68. Wada A and Suyama A 1986 Local stability of DNA and RNA secondary structure and its relation to biological functions. Prog. Biophys. Mol. Biol. 47 113–157PubMedCrossRefGoogle Scholar
  69. Wu J, Hu Z and DeLisi C 2006 Gene annotation and network inference by phylogenetic profiling. Bioinformatics 7 80Google Scholar
  70. Yakovchuk P, Protozanova E and Frank-Kamenetskii MD 2006 Base-stacking and base-pairing contributions into thermal stability of the DNA double helix DOI:dx.doi.org . Nucleic Acids Res. 34 564–574Google Scholar
  71. Yeh R-F, Lim LP and Burge CB 2001 Computational inference of homologous gene structures in the human genome. Genome Res. 11 803–816Google Scholar
  72. Yok NG and Rosen GL 2011 Combining gene prediction methods to improve metagenomic gene annotation. BMC Bioinformatics 12 20Google Scholar
  73. Yu GX, Snyder EE, Boyle SM, Crasta OR, Czar M, Mane SP, et al. 2007 A versatile computational pipeline for bacterial genome annotation improvement and comparative analysis, with Brucella as a use case. Nucleic Acids Res. 35 3953–3962Google Scholar
  74. Zhu HQ, Hu GQ, Ouyang ZQ, Wang J and She ZS 2004 Accuracy improvement for identifying translation initiation sites in microbial genomes. Bioinformatics 20 3308–3317PubMedCrossRefGoogle Scholar
  75. Zhu HQ, Hu GQ, Yang YF, Wang J, and She ZS 2007 MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes. BMC Bioinformatics 8 97Google Scholar

Copyright information

© Indian Academy of Sciences 2012

Authors and Affiliations

  1. 1.Department of ChemistryIndian Institute of TechnologyNew DelhiIndia
  2. 2.Supercomputing Facility for Bioinformatics and Computational BiologyIndian Institute of TechnologyNew DelhiIndia
  3. 3.Kusuma School of Biological SciencesIndian Institute of TechnologyNew DelhiIndia

Personalised recommendations