Skip to main content
Log in

TA, GT and AC are significantly under-represented in open reading frames of prokaryotic and eukaryotic protein-coding genes

  • Original Article
  • Published:
Molecular Genetics and Genomics Aims and scope Submit manuscript

Abstract

Genomes can be considered a combination of 16 dinucleotides. Analysing the relative abundance of different dinucleotides may reveal important features of genome evolution. In present study, we conducted extensive surveys on the relative abundances of dinucleotides in various genomic components of 28 bacterial, 20 archaean, 19 fungal, 24 plant and 29 animal species. We found that TA, GT and AC are significantly under-represented in open reading frames of all organisms and in intergenic regions and introns of most organisms. Specific dinucleotides are of greatly varied usage at different codon positions. The significantly low representations of TA, GT and AC are considered the evolutionary consequences of preventing formation of pre-mature stop codons and of reducing intron-splicing options in candidate primary mRNA sequences. These data suggest that a reduction of TA and GT occurred on both strands of the DNA sequence at an early stage of de novo gene birth. Interestingly, GT and AC are also significantly under-represented in current prokaryotic genomes, suggesting that ancient prokaryotic protein-coding genes might have contained introns. The greatly varied usages of specific dinucleotides at different codon positions are considered evolutionary accommodations to compensate the unavailability of specific codons and to avoid formation of pre-mature stop codons. This is the first report presenting data of dinucleotide relative abundance to indicate the possible existence of spliceosomal introns in ancient prokaryotic genes and to hypothesize early steps of de novo gene birth.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Behura SK, Severson DW (2012) Comparative analysis of codon usage bias and codon context patterns between Dipteran and Hymenopteran sequenced genomes. PLoS One 7:e43111

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bird AP (1980) DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res 8:1499–1504

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bird AP (1986) CpG-rich islands and the function of DNA methylation. Nature 321:209–213

    Article  CAS  PubMed  Google Scholar 

  • Burge C, Campbell AM, Karlin S (1992) Over- and under-representation of short oligonucleotides in DNA sequences. Proc Natl Acad Sci USA 89:1358–1362

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Carmel L, Wolf YI, Rogozin IB, Koonin EV (2007) Three distinct modes of intron dynamics in the evolution of eukaryotes. Genome Res 17:1034–1044

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Carvunis AR, Rolland T, Wapinski I, Calderwood MA, Yildirim MA, Simonis N, Charloteaux B, Hidalgo CA, Barbette J, Santhanam B et al (2012) Proto-genes and de novo gene birth. Nature 487:370–374

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Clutterbuck AJ (2017) Genomic CG dinucleotide deficiencies associated with transposable element hypermutation in Basidiomycetes, some lower fungi, a moss and a clubmoss. Fungal Genet Biol 104:16–28

    Article  CAS  PubMed  Google Scholar 

  • Csuros M, Rogozin IB, Koonin EV (2011) A detailed history of intron-rich eukaryotic ancestors inferred from a global survey of 100 complete genomes. PLoS Comput Biol 7:e1002150

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Di Giallonardo F, Schlub TE, Shi M, Holmes EC (2017) Dinucleotide composition in animal RNA viruses is shaped more by virus family than by host species. J Virol 91:e02381–e02316

    Article  PubMed  PubMed Central  Google Scholar 

  • Doolittle WF, Stoltzfus A (1993) Molecular evolution: Genes-in-pieces revisited. Nature 361:403

    Article  CAS  PubMed  Google Scholar 

  • Ekman D, Elofsson A (2010) Identifying and quantifying orphan protein sequences in fungi. J Mol Biol 396:396–405

    Article  CAS  PubMed  Google Scholar 

  • Gentles AJ, Karlin S (2001) Genome-scale compositional comparisons in eukaryotes. Genome Res 11:540–546

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Giacomelli MG, Hancock AS, Masel J (2007) The conversion of 3′ UTRs into coding regions. Mol Biol Evol 24:457–464

    Article  CAS  PubMed  Google Scholar 

  • Gilbert W (1987) The exon theory of genes. Cold Spring Harb Symp Quant Biol 52:901–905

    Article  CAS  PubMed  Google Scholar 

  • Guerzoni D, McLysaght A (2016) De novo genes arise at a slow but steady rate along the primate lineage and have been subject to incomplete lineage sorting. Genome Biol Evol 8:1222–1232

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Jabbari K, Bernardi G (2004) Cytosine methylation and CpG, TpG (CpA) and TpA frequencies. Gene 333:143–149

    Article  CAS  PubMed  Google Scholar 

  • Jackson S, Cannone J, Lee J, Gutell R, Woodson S (2002) Distribution of rRNA introns in the three-dimensional structure of the ribosome. J Mol Biol 323:35–52

    Article  CAS  PubMed  Google Scholar 

  • Karlin S, Burge C (1995) Dinucleotide relative abundance extremes: a genomic signature. Trends Genet 11:283–290

    Article  CAS  PubMed  Google Scholar 

  • Karlin S, Mrázek J (1997) Compositional differences within and between eukaryotic genomes. Proc Natl Acad Sci USA 94:10227–10232

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Karlin S, Ladunga I, Blaisdell BE (1994) Heterogeneity of genomes: measures and values. Proc Natl Acad Sci USA 91:12837–12841

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kjems J, Garrett R (1998) Novel splicing mechanism for the ribosomal RNA intron in the archaebacterium Desulfurococcus mobilis. Cell 54:693–703

    Article  Google Scholar 

  • Logsdon JM Jr (1998) The recent origins of spliceosomal introns revisited. Curr Opin Genet Dev 8:637–648

    Article  CAS  PubMed  Google Scholar 

  • Ma YP, Ke H, Liang ZL, Liu ZX, Hao L, Ma JY, Li YG (2016) Multiple evolutionary selections involved in synonymous codon usages in the Streptococcus agalactiae genome. Int J Mol Sci 17:277

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Marck C, Grosjean H (2003) Identification of BHB splicing motifs in intron-containing tRNAs from 18 archaea: evolutionary implications. RNA 9:1516–1531

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • McLysaght A, Guerzoni D (2015) New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation. Philos Trans R Soc Lond B Biol Sci 370:20140332

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Nabholz B, Künstner A, Wang R, Jarvis ED, Ellegren H (2011) Dynamic evolution of base composition: causes and consequences in avian phylogenomics. Mol Biol Evol 28:2197–2210

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Rodríguez-Trelles F, Tarrío R, Ayala FJ (2006) Origins and evolution of spliceosomal introns. Annu Rev Genet 40:47–76

    Article  CAS  PubMed  Google Scholar 

  • Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV (2003) Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol 13:1512–1517

    Article  CAS  PubMed  Google Scholar 

  • Rogozin IB, Carmel L, Csuros M, Koonin EV (2012) Origin and evolution of spliceosomal introns. Biol Direct 7:11

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Roy SW (2003) Recent evidence for the exon theory of genes. Genetica 118:251–266

    Article  CAS  PubMed  Google Scholar 

  • Salman V, Amann R, Shub DA, Schulz-Vogt HN (2012) Multiple self-splicing introns in the 16S rRNA genes of giant sulfur bacteria. Proc Natl Acad Sci USA 109:4203–4208

    Article  PubMed  PubMed Central  Google Scholar 

  • Schmitz JF, Bornberg-Bauer E (2017) Fact or fiction: updates on how protein-coding genes might emerge de novo from previously non-coding DNA. F1000Res 6:57

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Tarallo A, Angelini C, Sanges R, Yagi M, Agnisola C, D’Onofrio G (2016) On the genome base composition of teleosts: the effect of environment and lifestyle. BMC Genom 17:173

    Article  CAS  Google Scholar 

  • Tautz D, Domazet-Lošo T (2011) The evolutionary origin of orphan genes. Nat Rev Genet 12:692–702

    Article  CAS  PubMed  Google Scholar 

  • Travers AA, Schwabe JW (1993) Spurring on transcription? Curr Biol 3:898–900

    Article  CAS  PubMed  Google Scholar 

  • Vakirlis N, Hebert AS, Opulente DA, Achaz G, Hittinger CT, Fischer G, Coon JJ, Lafontaine I (2018) A molecular portrait of de novo genes in yeasts. Mol Biol Evol 35:631–645

    Article  CAS  PubMed  Google Scholar 

  • Wang Y, Tao XF, Su ZX, Liu AK, Liu TL, Sun L, Yao Q, Chen KP, Gu X (2016) Current bacterial gene encoding capsule biosynthesis protein CapI contains nucleotides derived from exonization. Evol Bioinform 12:303–312

    CAS  Google Scholar 

  • Wilson BA, Foy SG, Neme R, Masel J (2017) Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat Ecol Evol 1:0146–0146

    Article  PubMed  PubMed Central  Google Scholar 

  • Yakovchuk P, Protozanova E, Frank-Kamenetskii MD (2006) Base-stacking and base-pairing contributions into thermal stability of the DNA double helix. Nucleic Acids Res 34:564–574

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhou JH, Ding YZ, He Y, Chu YF, Zhao P, Ma LY, Wang XJ, Li XR, Liu YS (2014) The effect of multiple evolutionary selections on synonymous codon usage of genes in the Mycoplasma bovis genome. PLoS One 9:e108949

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This study was supported by the National Natural Science Foundation of China (No. 31572467 and No. 31872425) and the Project Funded by Priority Academic Program Development (PAPD) of Jiangsu Higher Education Institutions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Wang.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Research involving human or animal participants

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by S. Hohmann.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Zeng, Z., Liu, TL. et al. TA, GT and AC are significantly under-represented in open reading frames of prokaryotic and eukaryotic protein-coding genes. Mol Genet Genomics 294, 637–647 (2019). https://doi.org/10.1007/s00438-019-01535-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00438-019-01535-1

Keywords

Navigation