Abstract
The distribution of n-tuplet frequencies is shown to strongly correlate with functionality when examining a genomic sequence in a reading-frame specific manner. The approach described herein applies a coarse-graining procedure, which is able to reveal aspects of triplet usage that are related to protein coding, while at the same time remaining species independent, based on a simple summation of suitable triplet occurrences measures. These quantities are ratios of simple frequencies to suitable mononucleotide-frequency products promoting the incidence of the RNY motif, preferred in the most widely used codons. A significant distinction of coding and noncoding sequences is achieved.
Similar content being viewed by others
References
H Akashi A Eyre-Walker (1998) ArticleTitleTranslational selection and molecular evolution Curr Opin Genet Dev 8 688–693 Occurrence Handle10.1016/S0959-437X(98)80038-5 Occurrence Handle1:CAS:528:DyaK1MXhs1yitA%3D%3D Occurrence Handle9914211
H Akashi RM Kliman A Eyre-Walker (1998) ArticleTitleMutation pressure, natural selection, and the evolution of base composition in Drosophila Genetica 102-103 49–60 Occurrence Handle10.1023/A:1017078607465 Occurrence Handle1:STN:280:DyaK1czos1Onsg%3D%3D Occurrence Handle9720271
Y Almirantis C Nikolaou (2004) ArticleTitleMulti-criterial coding sequence prediction. Combination of GeneMark with two novel, coding-character specific quantities Comput Biol Med . .
G Bernardi (1989) ArticleTitleThe isochore organization of the human genome Annu Rev Genet 23 637–661 Occurrence Handle10.1146/annurev.ge.23.120189.003225 Occurrence Handle1:CAS:528:DyaK3cXntV2rsw%3D%3D Occurrence Handle2694946
G Bernardi (1993) ArticleTitleThe isochore organization of the human genome and its evolutionary history—a review Gene 135 57–66 Occurrence Handle10.1016/0378-1119(93)90049-9 Occurrence Handle1:CAS:528:DyaK2cXpslaguw%3D%3D Occurrence Handle8276279
BE Blaisdell (1986) ArticleTitleA measure of the similarity of sets of sequences not requiring sequence alignment Proc National Academy of Sciences of the United States of America 83 5155–5159 Occurrence Handle1:CAS:528:DyaL28XlsV2htr4%3D
V Brendel JS Beckmann EN Trifonov (1986) ArticleTitleLinguistics of nucleotide sequences: morphology and comparison of vocabularies J Biomol Struct Dyn 4 11–21 Occurrence Handle1:CAS:528:DyaL28XlvVKjtbo%3D Occurrence Handle3078230
M Bulmer (1991) ArticleTitleThe selection-mutation-drift theory of synonymous codon usage Genetics 129 897–907 Occurrence Handle1:CAS:528:DyaK38XhsVKhtL0%3D Occurrence Handle1752426
C Burge AM Campbell S Karlin (1992) ArticleTitleOver- and under-representation of short oligonucleotides in DNA sequences Proc Natl Acad Sci USA 89 1358–1362 Occurrence Handle1:CAS:528:DyaK38XhsVGhu7Y%3D Occurrence Handle1741388
FH Crick S Brenner A Klug G Pieczenik (1976) ArticleTitleA speculation on the origin of protein synthesis Orig Life 7 389–397 Occurrence Handle10.1007/BF00927934 Occurrence Handle1:CAS:528:DyaE2sXktFOls74%3D Occurrence Handle1023138
M Eigen P Schuster (1977) ArticleTitleThe hypercycle. A principle of natural self-organization. A: Emergence of the hypercycle Naturwissenschaften 64 541–565 Occurrence Handle10.1007/BF00450633 Occurrence Handle1:CAS:528:DyaE1cXjtlGjtQ%3D%3D Occurrence Handle593400
JW Fickett (1996) ArticleTitleFinding genes by computer: The state of the art Trends Genet 12 316–320 Occurrence Handle10.1016/0168-9525(96)10038-X Occurrence Handle1:CAS:528:DyaK28XkvV2rs7o%3D Occurrence Handle8783942
JW Fickett CS Tung (1992) ArticleTitleAssessment of protein coding measures Nucleic Acids Res 20 6441–6450 Occurrence Handle1:CAS:528:DyaK3sXnsVCmtQ%3D%3D Occurrence Handle1480466
R Guigó (1999) DNA composition, codon usage and exon prediction Jm Bishop (Eds) Genetic databases Academic Press New York
G Gutiérrez JL Oliver A Marín (1993) ArticleTitleDinucleotides and G+C Content in human genes: opposite behavior of GpG, GpC, and TpC at II-III codon positions and in introns J Mol Evol 37 131–136 Occurrence Handle8411202
G Gutiérrez J Oliver A Marin (1994) ArticleTitleOn the orgin of the periodicity of three in protein coding DNA sequences J theor Biol 167 413–414 Occurrence Handle10.1006/jtbi.1994.1080 Occurrence Handle8207954
R Hanai A Wada (1989) ArticleTitleNovel third-letter bias in Escherichia coli codons revealed by rigorous treatment of coding constraints J Mol Biol 207 655–606 Occurrence Handle10.1016/0022-2836(89)90235-0 Occurrence Handle1:CAS:528:DyaL1MXksFWgtbo%3D Occurrence Handle2474661
BL Hao (2000a) ArticleTitleFractals from genomes Mod Phys Lett B 14 871–875 Occurrence Handle10.1016/S0217-9849(00)00115-4 Occurrence Handle1:CAS:528:DC%2BD3MXhs1Sgu7k%3D
BL Hao (2000b) ArticleTitleFractals from genomes—Exact solutions of a biology-inspired problem Physica A 282 225–246 Occurrence Handle1:CAS:528:DC%2BD3cXktVShsLw%3D
H Herzel I Grosse (1995) ArticleTitleMeasuring correlations in symbol sequences Physica A 216 518–542 Occurrence Handle1:CAS:528:DyaK2MXnsFKrtL8%3D
HJ Jeffrey (1990) ArticleTitleChaos game representation of gene structure Nucleic Acids Res 18 2163–2170 Occurrence Handle1:CAS:528:DyaK3cXksFeltrk%3D Occurrence Handle2336393
S Karlin C Burge (1995) ArticleTitleDinucleotide relative abundance extremes: a genomic signature Trends Genet 11 283–290 Occurrence Handle10.1016/S0168-9525(00)89076-9 Occurrence Handle1:CAS:528:DyaK2MXmvVahtLY%3D Occurrence Handle7482779
S Karlin I Ladunga (1994) ArticleTitleComparisons of eukaryotic genomic sequences Proc Natl Acad Sci USA 91 12832–12836 Occurrence Handle1:CAS:528:DyaK2MXivVCqsrs%3D Occurrence Handle7809130
S Karlin J Mrazek (1997) ArticleTitleCompositional difference within and between eukaryotic genomes Proc Natl Acad Sci USA 94 10227–10232 Occurrence Handle10.1073/pnas.94.19.10227 Occurrence Handle1:CAS:528:DyaK2sXmt1Gjs7o%3D Occurrence Handle9294192
S Karlin I Ladunga BE Blaisdell (1994) ArticleTitleHeterogeneity of genomes: Measures and values Proc Natl Acad Sci USA 91 12837–12844 Occurrence Handle1:CAS:528:DyaK2MXivVCqsrg%3D Occurrence Handle7809131
RM Kliman A Eyre-Walker (1998) ArticleTitlePatterns of base composition within the genes of Drosophila melanogaster J Mol Evol 46 534–541 Occurrence Handle1:CAS:528:DyaK1cXislGku7Y%3D Occurrence Handle9545464
E Kraemer J Wang J Guo S Hopkins J. Arnold (2001) ArticleTitleAn analysis of gene-finding programs for Neurospora crassa Bioinformatics. 17 901–912 Occurrence Handle10.1093/bioinformatics/17.10.901 Occurrence Handle1:CAS:528:DC%2BD3MXot1Ggtbg%3D Occurrence Handle11673234
AV Lukashin M Borodovsky (1998) ArticleTitleGeneMark.hmm: New solutions for gene finding Nucleic Acids Res 26 1107–1115 Occurrence Handle10.1093/nar/26.4.1107 Occurrence Handle1:CAS:528:DyaK1cXhvVWksr4%3D Occurrence Handle9461475
CH Makhoul EN. Trifonov (2002) ArticleTitleDistribution of rare triplets along mRNA and their relation to protein folding J Biomol Struct Dyn 20 413–420 Occurrence Handle1:CAS:528:DC%2BD3sXjsVCm Occurrence Handle12437379
G Marais D Mouchiroud L Duret (2001) ArticleTitleDoes recombination improve selection on codon usage? Lessons from nematode and fly complete genomes Proc Natl Acad Sci USA 98 5688–5692 Occurrence Handle10.1073/pnas.091427698 Occurrence Handle1:CAS:528:DC%2BD3MXjs1WnsLo%3D Occurrence Handle11320215
C Nikolaou Y Almirantis (2002) ArticleTitleA study of the middle-scale nucleotide clustering in DNA sequences of various origin and functionality, by means of a method based on a modified standard deviation J Theor Biol 217 479–492 Occurrence Handle10.1006/jtbi.2002.3045 Occurrence Handle1:CAS:528:DC%2BD38XmvVams7s%3D Occurrence Handle12234754 Occurrence HandleMR2027272
C Nikolaou Y Almirantis (2003) ArticleTitleMutually symmetric and complementary triplets: differences in their use distinguish systematically between coding and noncoding genomic sequences J Theor Biol 223 477–487 Occurrence Handle10.1016/S0022-5193(03)00123-1 Occurrence Handle1:CAS:528:DC%2BD3sXls1eqt7s%3D Occurrence Handle12875825 Occurrence HandleMR2067858
JA Novembre (2002) ArticleTitleAccounting for background nucleotide composition when measuring codon usage bias Mol Biol Evol 19 1390–1394 Occurrence Handle1:CAS:528:DC%2BD38XmtF2rs7o%3D Occurrence Handle12140252
R Nussinov (1981) ArticleTitleEukaryotic dinucleotide preference rules and their implications for degenerate codon usage J Mol Biol 149 125–131 Occurrence Handle10.1016/0022-2836(81)90264-3 Occurrence Handle1:CAS:528:DyaL3MXlsVGntL0%3D Occurrence Handle6273582
S Rogic AK Mackworth FB Ouellette (2001) ArticleTitleEvaluation of gene-finding programs on mammalian sequences Genome Res 11 817–832 Occurrence Handle10.1101/gr.147901 Occurrence Handle1:CAS:528:DC%2BD3MXjs1Wmurc%3D Occurrence Handle11337477
PM Sharp WH Li (1986) ArticleTitleAn evolutionary perspective on synonymous codon usage in unicellular organisms J Mol Evol 24 28–38 Occurrence Handle1:CAS:528:DyaL2sXpslSjug%3D%3D Occurrence Handle3104616
JC Shepherd (1981) ArticleTitleMethod to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification Proc Natl Acad Sci USA 78 1596–1600 Occurrence Handle1:CAS:528:DyaL3MXhs12lu78%3D Occurrence Handle6940175
JC Shepherd (1990) ArticleTitleAncient patterns in nucleic acid sequences Methods Enzymol 183 180–192 Occurrence Handle1:CAS:528:DyaK3cXkslanurc%3D Occurrence Handle2314275
EE Stuckle C Emmrich U Grob PJ Nielsen (1990) ArticleTitleStatistical analysis of nucleotide sequences Nucleic Acids Res 18 6641–6647 Occurrence Handle1:STN:280:By6D28jksFY%3D Occurrence Handle2251125
EE Stuckle PJ Nielsen U Grob (1992) ArticleTitleProbability of occurrence of specific oligomers J Theor Biol 159 299–306 Occurrence Handle1:STN:280:ByyB3cvkvFU%3D Occurrence Handle1296092
S Tiwari S Ramachandran A Bhattacharya S Bhattacharya Ramaswamy (1997) ArticleTitlePrediction of probable genes by fourier analysis of genomic sequences Comp Appl in Biosci 13 263–270 Occurrence Handle1:CAS:528:DyaK2sXksVGntLs%3D
AO Urrutia LD Hurst (2001) ArticleTitleCodon usage bias covaries with expression breadth and the rate of synonymous evolution in humans, but this is not evidence for selection Genetics 159 1191–1199 Occurrence Handle1:CAS:528:DC%2BD38XjvVamtA%3D%3D Occurrence Handle11729162
JT Wong R Cedergren (1986) ArticleTitleNatural selection versus primitive gene structure as determinant of codon usage Eur J Biochem 159 175–180 Occurrence Handle10.1111/j.1432-1033.1986.tb09849.x Occurrence Handle1:CAS:528:DyaL28XltV2qtr8%3D Occurrence Handle3091367
Author information
Authors and Affiliations
Corresponding author
Additional information
Reviewing Editor: Dr. Massimo Di Giulio
Rights and permissions
About this article
Cite this article
Nikolaou, C., Almirantis, Y. Measuring the Coding Potential of Genomic Sequences Througha Combination of Triplet Occurrence Patterns and RNY Preference. J Mol Evol 59, 309–316 (2004). https://doi.org/10.1007/s00239-004-2626-7
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s00239-004-2626-7