Skip to main content

Advertisement

Log in

Measuring the Coding Potential of Genomic Sequences Througha Combination of Triplet Occurrence Patterns and RNY Preference

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

The distribution of n-tuplet frequencies is shown to strongly correlate with functionality when examining a genomic sequence in a reading-frame specific manner. The approach described herein applies a coarse-graining procedure, which is able to reveal aspects of triplet usage that are related to protein coding, while at the same time remaining species independent, based on a simple summation of suitable triplet occurrences measures. These quantities are ratios of simple frequencies to suitable mononucleotide-frequency products promoting the incidence of the RNY motif, preferred in the most widely used codons. A significant distinction of coding and noncoding sequences is achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

Similar content being viewed by others

References

  • H Akashi A Eyre-Walker (1998) ArticleTitleTranslational selection and molecular evolution Curr Opin Genet Dev 8 688–693 Occurrence Handle10.1016/S0959-437X(98)80038-5 Occurrence Handle1:CAS:528:DyaK1MXhs1yitA%3D%3D Occurrence Handle9914211

    Article  CAS  PubMed  Google Scholar 

  • H Akashi RM Kliman A Eyre-Walker (1998) ArticleTitleMutation pressure, natural selection, and the evolution of base composition in Drosophila Genetica 102-103 49–60 Occurrence Handle10.1023/A:1017078607465 Occurrence Handle1:STN:280:DyaK1czos1Onsg%3D%3D Occurrence Handle9720271

    Article  CAS  PubMed  Google Scholar 

  • Y Almirantis C Nikolaou (2004) ArticleTitleMulti-criterial coding sequence prediction. Combination of GeneMark with two novel, coding-character specific quantities Comput Biol Med . .

    Google Scholar 

  • G Bernardi (1989) ArticleTitleThe isochore organization of the human genome Annu Rev Genet 23 637–661 Occurrence Handle10.1146/annurev.ge.23.120189.003225 Occurrence Handle1:CAS:528:DyaK3cXntV2rsw%3D%3D Occurrence Handle2694946

    Article  CAS  PubMed  Google Scholar 

  • G Bernardi (1993) ArticleTitleThe isochore organization of the human genome and its evolutionary history—a review Gene 135 57–66 Occurrence Handle10.1016/0378-1119(93)90049-9 Occurrence Handle1:CAS:528:DyaK2cXpslaguw%3D%3D Occurrence Handle8276279

    Article  CAS  PubMed  Google Scholar 

  • BE Blaisdell (1986) ArticleTitleA measure of the similarity of sets of sequences not requiring sequence alignment Proc National Academy of Sciences of the United States of America 83 5155–5159 Occurrence Handle1:CAS:528:DyaL28XlsV2htr4%3D

    CAS  Google Scholar 

  • V Brendel JS Beckmann EN Trifonov (1986) ArticleTitleLinguistics of nucleotide sequences: morphology and comparison of vocabularies J Biomol Struct Dyn 4 11–21 Occurrence Handle1:CAS:528:DyaL28XlvVKjtbo%3D Occurrence Handle3078230

    CAS  PubMed  Google Scholar 

  • M Bulmer (1991) ArticleTitleThe selection-mutation-drift theory of synonymous codon usage Genetics 129 897–907 Occurrence Handle1:CAS:528:DyaK38XhsVKhtL0%3D Occurrence Handle1752426

    CAS  PubMed  Google Scholar 

  • C Burge AM Campbell S Karlin (1992) ArticleTitleOver- and under-representation of short oligonucleotides in DNA sequences Proc Natl Acad Sci USA 89 1358–1362 Occurrence Handle1:CAS:528:DyaK38XhsVGhu7Y%3D Occurrence Handle1741388

    CAS  PubMed  Google Scholar 

  • FH Crick S Brenner A Klug G Pieczenik (1976) ArticleTitleA speculation on the origin of protein synthesis Orig Life 7 389–397 Occurrence Handle10.1007/BF00927934 Occurrence Handle1:CAS:528:DyaE2sXktFOls74%3D Occurrence Handle1023138

    Article  CAS  PubMed  Google Scholar 

  • M Eigen P Schuster (1977) ArticleTitleThe hypercycle. A principle of natural self-organization. A: Emergence of the hypercycle Naturwissenschaften 64 541–565 Occurrence Handle10.1007/BF00450633 Occurrence Handle1:CAS:528:DyaE1cXjtlGjtQ%3D%3D Occurrence Handle593400

    Article  CAS  PubMed  Google Scholar 

  • JW Fickett (1996) ArticleTitleFinding genes by computer: The state of the art Trends Genet 12 316–320 Occurrence Handle10.1016/0168-9525(96)10038-X Occurrence Handle1:CAS:528:DyaK28XkvV2rs7o%3D Occurrence Handle8783942

    Article  CAS  PubMed  Google Scholar 

  • JW Fickett CS Tung (1992) ArticleTitleAssessment of protein coding measures Nucleic Acids Res 20 6441–6450 Occurrence Handle1:CAS:528:DyaK3sXnsVCmtQ%3D%3D Occurrence Handle1480466

    CAS  PubMed  Google Scholar 

  • R Guigó (1999) DNA composition, codon usage and exon prediction Jm Bishop (Eds) Genetic databases Academic Press New York

    Google Scholar 

  • G Gutiérrez JL Oliver A Marín (1993) ArticleTitleDinucleotides and G+C Content in human genes: opposite behavior of GpG, GpC, and TpC at II-III codon positions and in introns J Mol Evol 37 131–136 Occurrence Handle8411202

    PubMed  Google Scholar 

  • G Gutiérrez J Oliver A Marin (1994) ArticleTitleOn the orgin of the periodicity of three in protein coding DNA sequences J theor Biol 167 413–414 Occurrence Handle10.1006/jtbi.1994.1080 Occurrence Handle8207954

    Article  PubMed  Google Scholar 

  • R Hanai A Wada (1989) ArticleTitleNovel third-letter bias in Escherichia coli codons revealed by rigorous treatment of coding constraints J Mol Biol 207 655–606 Occurrence Handle10.1016/0022-2836(89)90235-0 Occurrence Handle1:CAS:528:DyaL1MXksFWgtbo%3D Occurrence Handle2474661

    Article  CAS  PubMed  Google Scholar 

  • BL Hao (2000a) ArticleTitleFractals from genomes Mod Phys Lett B 14 871–875 Occurrence Handle10.1016/S0217-9849(00)00115-4 Occurrence Handle1:CAS:528:DC%2BD3MXhs1Sgu7k%3D

    Article  CAS  Google Scholar 

  • BL Hao (2000b) ArticleTitleFractals from genomes—Exact solutions of a biology-inspired problem Physica A 282 225–246 Occurrence Handle1:CAS:528:DC%2BD3cXktVShsLw%3D

    CAS  Google Scholar 

  • H Herzel I Grosse (1995) ArticleTitleMeasuring correlations in symbol sequences Physica A 216 518–542 Occurrence Handle1:CAS:528:DyaK2MXnsFKrtL8%3D

    CAS  Google Scholar 

  • HJ Jeffrey (1990) ArticleTitleChaos game representation of gene structure Nucleic Acids Res 18 2163–2170 Occurrence Handle1:CAS:528:DyaK3cXksFeltrk%3D Occurrence Handle2336393

    CAS  PubMed  Google Scholar 

  • S Karlin C Burge (1995) ArticleTitleDinucleotide relative abundance extremes: a genomic signature Trends Genet 11 283–290 Occurrence Handle10.1016/S0168-9525(00)89076-9 Occurrence Handle1:CAS:528:DyaK2MXmvVahtLY%3D Occurrence Handle7482779

    Article  CAS  PubMed  Google Scholar 

  • S Karlin I Ladunga (1994) ArticleTitleComparisons of eukaryotic genomic sequences Proc Natl Acad Sci USA 91 12832–12836 Occurrence Handle1:CAS:528:DyaK2MXivVCqsrs%3D Occurrence Handle7809130

    CAS  PubMed  Google Scholar 

  • S Karlin J Mrazek (1997) ArticleTitleCompositional difference within and between eukaryotic genomes Proc Natl Acad Sci USA 94 10227–10232 Occurrence Handle10.1073/pnas.94.19.10227 Occurrence Handle1:CAS:528:DyaK2sXmt1Gjs7o%3D Occurrence Handle9294192

    Article  CAS  PubMed  Google Scholar 

  • S Karlin I Ladunga BE Blaisdell (1994) ArticleTitleHeterogeneity of genomes: Measures and values Proc Natl Acad Sci USA 91 12837–12844 Occurrence Handle1:CAS:528:DyaK2MXivVCqsrg%3D Occurrence Handle7809131

    CAS  PubMed  Google Scholar 

  • RM Kliman A Eyre-Walker (1998) ArticleTitlePatterns of base composition within the genes of Drosophila melanogaster J Mol Evol 46 534–541 Occurrence Handle1:CAS:528:DyaK1cXislGku7Y%3D Occurrence Handle9545464

    CAS  PubMed  Google Scholar 

  • E Kraemer J Wang J Guo S Hopkins J. Arnold (2001) ArticleTitleAn analysis of gene-finding programs for Neurospora crassa Bioinformatics. 17 901–912 Occurrence Handle10.1093/bioinformatics/17.10.901 Occurrence Handle1:CAS:528:DC%2BD3MXot1Ggtbg%3D Occurrence Handle11673234

    Article  CAS  PubMed  Google Scholar 

  • AV Lukashin M Borodovsky (1998) ArticleTitleGeneMark.hmm: New solutions for gene finding Nucleic Acids Res 26 1107–1115 Occurrence Handle10.1093/nar/26.4.1107 Occurrence Handle1:CAS:528:DyaK1cXhvVWksr4%3D Occurrence Handle9461475

    Article  CAS  PubMed  Google Scholar 

  • CH Makhoul EN. Trifonov (2002) ArticleTitleDistribution of rare triplets along mRNA and their relation to protein folding J Biomol Struct Dyn 20 413–420 Occurrence Handle1:CAS:528:DC%2BD3sXjsVCm Occurrence Handle12437379

    CAS  PubMed  Google Scholar 

  • G Marais D Mouchiroud L Duret (2001) ArticleTitleDoes recombination improve selection on codon usage? Lessons from nematode and fly complete genomes Proc Natl Acad Sci USA 98 5688–5692 Occurrence Handle10.1073/pnas.091427698 Occurrence Handle1:CAS:528:DC%2BD3MXjs1WnsLo%3D Occurrence Handle11320215

    Article  CAS  PubMed  Google Scholar 

  • C Nikolaou Y Almirantis (2002) ArticleTitleA study of the middle-scale nucleotide clustering in DNA sequences of various origin and functionality, by means of a method based on a modified standard deviation J Theor Biol 217 479–492 Occurrence Handle10.1006/jtbi.2002.3045 Occurrence Handle1:CAS:528:DC%2BD38XmvVams7s%3D Occurrence Handle12234754 Occurrence HandleMR2027272

    Article  CAS  PubMed  MathSciNet  Google Scholar 

  • C Nikolaou Y Almirantis (2003) ArticleTitleMutually symmetric and complementary triplets: differences in their use distinguish systematically between coding and noncoding genomic sequences J Theor Biol 223 477–487 Occurrence Handle10.1016/S0022-5193(03)00123-1 Occurrence Handle1:CAS:528:DC%2BD3sXls1eqt7s%3D Occurrence Handle12875825 Occurrence HandleMR2067858

    Article  CAS  PubMed  MathSciNet  Google Scholar 

  • JA Novembre (2002) ArticleTitleAccounting for background nucleotide composition when measuring codon usage bias Mol Biol Evol 19 1390–1394 Occurrence Handle1:CAS:528:DC%2BD38XmtF2rs7o%3D Occurrence Handle12140252

    CAS  PubMed  Google Scholar 

  • R Nussinov (1981) ArticleTitleEukaryotic dinucleotide preference rules and their implications for degenerate codon usage J Mol Biol 149 125–131 Occurrence Handle10.1016/0022-2836(81)90264-3 Occurrence Handle1:CAS:528:DyaL3MXlsVGntL0%3D Occurrence Handle6273582

    Article  CAS  PubMed  Google Scholar 

  • S Rogic AK Mackworth FB Ouellette (2001) ArticleTitleEvaluation of gene-finding programs on mammalian sequences Genome Res 11 817–832 Occurrence Handle10.1101/gr.147901 Occurrence Handle1:CAS:528:DC%2BD3MXjs1Wmurc%3D Occurrence Handle11337477

    Article  CAS  PubMed  Google Scholar 

  • PM Sharp WH Li (1986) ArticleTitleAn evolutionary perspective on synonymous codon usage in unicellular organisms J Mol Evol 24 28–38 Occurrence Handle1:CAS:528:DyaL2sXpslSjug%3D%3D Occurrence Handle3104616

    CAS  PubMed  Google Scholar 

  • JC Shepherd (1981) ArticleTitleMethod to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification Proc Natl Acad Sci USA 78 1596–1600 Occurrence Handle1:CAS:528:DyaL3MXhs12lu78%3D Occurrence Handle6940175

    CAS  PubMed  Google Scholar 

  • JC Shepherd (1990) ArticleTitleAncient patterns in nucleic acid sequences Methods Enzymol 183 180–192 Occurrence Handle1:CAS:528:DyaK3cXkslanurc%3D Occurrence Handle2314275

    CAS  PubMed  Google Scholar 

  • EE Stuckle C Emmrich U Grob PJ Nielsen (1990) ArticleTitleStatistical analysis of nucleotide sequences Nucleic Acids Res 18 6641–6647 Occurrence Handle1:STN:280:By6D28jksFY%3D Occurrence Handle2251125

    CAS  PubMed  Google Scholar 

  • EE Stuckle PJ Nielsen U Grob (1992) ArticleTitleProbability of occurrence of specific oligomers J Theor Biol 159 299–306 Occurrence Handle1:STN:280:ByyB3cvkvFU%3D Occurrence Handle1296092

    CAS  PubMed  Google Scholar 

  • S Tiwari S Ramachandran A Bhattacharya S Bhattacharya Ramaswamy (1997) ArticleTitlePrediction of probable genes by fourier analysis of genomic sequences Comp Appl in Biosci 13 263–270 Occurrence Handle1:CAS:528:DyaK2sXksVGntLs%3D

    CAS  Google Scholar 

  • AO Urrutia LD Hurst (2001) ArticleTitleCodon usage bias covaries with expression breadth and the rate of synonymous evolution in humans, but this is not evidence for selection Genetics 159 1191–1199 Occurrence Handle1:CAS:528:DC%2BD38XjvVamtA%3D%3D Occurrence Handle11729162

    CAS  PubMed  Google Scholar 

  • JT Wong R Cedergren (1986) ArticleTitleNatural selection versus primitive gene structure as determinant of codon usage Eur J Biochem 159 175–180 Occurrence Handle10.1111/j.1432-1033.1986.tb09849.x Occurrence Handle1:CAS:528:DyaL28XltV2qtr8%3D Occurrence Handle3091367

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yannis Almirantis.

Additional information

Reviewing Editor: Dr. Massimo Di Giulio

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nikolaou, C., Almirantis, Y. Measuring the Coding Potential of Genomic Sequences Througha Combination of Triplet Occurrence Patterns and RNY Preference. J Mol Evol 59, 309–316 (2004). https://doi.org/10.1007/s00239-004-2626-7

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00239-004-2626-7

Keywords

Navigation