Abstract
Genomes can be considered a combination of 16 dinucleotides. Analysing the relative abundance of different dinucleotides may reveal important features of genome evolution. In present study, we conducted extensive surveys on the relative abundances of dinucleotides in various genomic components of 28 bacterial, 20 archaean, 19 fungal, 24 plant and 29 animal species. We found that TA, GT and AC are significantly under-represented in open reading frames of all organisms and in intergenic regions and introns of most organisms. Specific dinucleotides are of greatly varied usage at different codon positions. The significantly low representations of TA, GT and AC are considered the evolutionary consequences of preventing formation of pre-mature stop codons and of reducing intron-splicing options in candidate primary mRNA sequences. These data suggest that a reduction of TA and GT occurred on both strands of the DNA sequence at an early stage of de novo gene birth. Interestingly, GT and AC are also significantly under-represented in current prokaryotic genomes, suggesting that ancient prokaryotic protein-coding genes might have contained introns. The greatly varied usages of specific dinucleotides at different codon positions are considered evolutionary accommodations to compensate the unavailability of specific codons and to avoid formation of pre-mature stop codons. This is the first report presenting data of dinucleotide relative abundance to indicate the possible existence of spliceosomal introns in ancient prokaryotic genes and to hypothesize early steps of de novo gene birth.
Similar content being viewed by others
References
Behura SK, Severson DW (2012) Comparative analysis of codon usage bias and codon context patterns between Dipteran and Hymenopteran sequenced genomes. PLoS One 7:e43111
Bird AP (1980) DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res 8:1499–1504
Bird AP (1986) CpG-rich islands and the function of DNA methylation. Nature 321:209–213
Burge C, Campbell AM, Karlin S (1992) Over- and under-representation of short oligonucleotides in DNA sequences. Proc Natl Acad Sci USA 89:1358–1362
Carmel L, Wolf YI, Rogozin IB, Koonin EV (2007) Three distinct modes of intron dynamics in the evolution of eukaryotes. Genome Res 17:1034–1044
Carvunis AR, Rolland T, Wapinski I, Calderwood MA, Yildirim MA, Simonis N, Charloteaux B, Hidalgo CA, Barbette J, Santhanam B et al (2012) Proto-genes and de novo gene birth. Nature 487:370–374
Clutterbuck AJ (2017) Genomic CG dinucleotide deficiencies associated with transposable element hypermutation in Basidiomycetes, some lower fungi, a moss and a clubmoss. Fungal Genet Biol 104:16–28
Csuros M, Rogozin IB, Koonin EV (2011) A detailed history of intron-rich eukaryotic ancestors inferred from a global survey of 100 complete genomes. PLoS Comput Biol 7:e1002150
Di Giallonardo F, Schlub TE, Shi M, Holmes EC (2017) Dinucleotide composition in animal RNA viruses is shaped more by virus family than by host species. J Virol 91:e02381–e02316
Doolittle WF, Stoltzfus A (1993) Molecular evolution: Genes-in-pieces revisited. Nature 361:403
Ekman D, Elofsson A (2010) Identifying and quantifying orphan protein sequences in fungi. J Mol Biol 396:396–405
Gentles AJ, Karlin S (2001) Genome-scale compositional comparisons in eukaryotes. Genome Res 11:540–546
Giacomelli MG, Hancock AS, Masel J (2007) The conversion of 3′ UTRs into coding regions. Mol Biol Evol 24:457–464
Gilbert W (1987) The exon theory of genes. Cold Spring Harb Symp Quant Biol 52:901–905
Guerzoni D, McLysaght A (2016) De novo genes arise at a slow but steady rate along the primate lineage and have been subject to incomplete lineage sorting. Genome Biol Evol 8:1222–1232
Jabbari K, Bernardi G (2004) Cytosine methylation and CpG, TpG (CpA) and TpA frequencies. Gene 333:143–149
Jackson S, Cannone J, Lee J, Gutell R, Woodson S (2002) Distribution of rRNA introns in the three-dimensional structure of the ribosome. J Mol Biol 323:35–52
Karlin S, Burge C (1995) Dinucleotide relative abundance extremes: a genomic signature. Trends Genet 11:283–290
Karlin S, Mrázek J (1997) Compositional differences within and between eukaryotic genomes. Proc Natl Acad Sci USA 94:10227–10232
Karlin S, Ladunga I, Blaisdell BE (1994) Heterogeneity of genomes: measures and values. Proc Natl Acad Sci USA 91:12837–12841
Kjems J, Garrett R (1998) Novel splicing mechanism for the ribosomal RNA intron in the archaebacterium Desulfurococcus mobilis. Cell 54:693–703
Logsdon JM Jr (1998) The recent origins of spliceosomal introns revisited. Curr Opin Genet Dev 8:637–648
Ma YP, Ke H, Liang ZL, Liu ZX, Hao L, Ma JY, Li YG (2016) Multiple evolutionary selections involved in synonymous codon usages in the Streptococcus agalactiae genome. Int J Mol Sci 17:277
Marck C, Grosjean H (2003) Identification of BHB splicing motifs in intron-containing tRNAs from 18 archaea: evolutionary implications. RNA 9:1516–1531
McLysaght A, Guerzoni D (2015) New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation. Philos Trans R Soc Lond B Biol Sci 370:20140332
Nabholz B, Künstner A, Wang R, Jarvis ED, Ellegren H (2011) Dynamic evolution of base composition: causes and consequences in avian phylogenomics. Mol Biol Evol 28:2197–2210
Rodríguez-Trelles F, Tarrío R, Ayala FJ (2006) Origins and evolution of spliceosomal introns. Annu Rev Genet 40:47–76
Rogozin IB, Wolf YI, Sorokin AV, Mirkin BG, Koonin EV (2003) Remarkable interkingdom conservation of intron positions and massive, lineage-specific intron loss and gain in eukaryotic evolution. Curr Biol 13:1512–1517
Rogozin IB, Carmel L, Csuros M, Koonin EV (2012) Origin and evolution of spliceosomal introns. Biol Direct 7:11
Roy SW (2003) Recent evidence for the exon theory of genes. Genetica 118:251–266
Salman V, Amann R, Shub DA, Schulz-Vogt HN (2012) Multiple self-splicing introns in the 16S rRNA genes of giant sulfur bacteria. Proc Natl Acad Sci USA 109:4203–4208
Schmitz JF, Bornberg-Bauer E (2017) Fact or fiction: updates on how protein-coding genes might emerge de novo from previously non-coding DNA. F1000Res 6:57
Tarallo A, Angelini C, Sanges R, Yagi M, Agnisola C, D’Onofrio G (2016) On the genome base composition of teleosts: the effect of environment and lifestyle. BMC Genom 17:173
Tautz D, Domazet-Lošo T (2011) The evolutionary origin of orphan genes. Nat Rev Genet 12:692–702
Travers AA, Schwabe JW (1993) Spurring on transcription? Curr Biol 3:898–900
Vakirlis N, Hebert AS, Opulente DA, Achaz G, Hittinger CT, Fischer G, Coon JJ, Lafontaine I (2018) A molecular portrait of de novo genes in yeasts. Mol Biol Evol 35:631–645
Wang Y, Tao XF, Su ZX, Liu AK, Liu TL, Sun L, Yao Q, Chen KP, Gu X (2016) Current bacterial gene encoding capsule biosynthesis protein CapI contains nucleotides derived from exonization. Evol Bioinform 12:303–312
Wilson BA, Foy SG, Neme R, Masel J (2017) Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat Ecol Evol 1:0146–0146
Yakovchuk P, Protozanova E, Frank-Kamenetskii MD (2006) Base-stacking and base-pairing contributions into thermal stability of the DNA double helix. Nucleic Acids Res 34:564–574
Zhou JH, Ding YZ, He Y, Chu YF, Zhao P, Ma LY, Wang XJ, Li XR, Liu YS (2014) The effect of multiple evolutionary selections on synonymous codon usage of genes in the Mycoplasma bovis genome. PLoS One 9:e108949
Acknowledgements
This study was supported by the National Natural Science Foundation of China (No. 31572467 and No. 31872425) and the Project Funded by Priority Academic Program Development (PAPD) of Jiangsu Higher Education Institutions.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Research involving human or animal participants
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by S. Hohmann.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Wang, Y., Zeng, Z., Liu, TL. et al. TA, GT and AC are significantly under-represented in open reading frames of prokaryotic and eukaryotic protein-coding genes. Mol Genet Genomics 294, 637–647 (2019). https://doi.org/10.1007/s00438-019-01535-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00438-019-01535-1