TA, GT and AC are significantly under-represented in open reading frames of prokaryotic and eukaryotic protein-coding genes
Genomes can be considered a combination of 16 dinucleotides. Analysing the relative abundance of different dinucleotides may reveal important features of genome evolution. In present study, we conducted extensive surveys on the relative abundances of dinucleotides in various genomic components of 28 bacterial, 20 archaean, 19 fungal, 24 plant and 29 animal species. We found that TA, GT and AC are significantly under-represented in open reading frames of all organisms and in intergenic regions and introns of most organisms. Specific dinucleotides are of greatly varied usage at different codon positions. The significantly low representations of TA, GT and AC are considered the evolutionary consequences of preventing formation of pre-mature stop codons and of reducing intron-splicing options in candidate primary mRNA sequences. These data suggest that a reduction of TA and GT occurred on both strands of the DNA sequence at an early stage of de novo gene birth. Interestingly, GT and AC are also significantly under-represented in current prokaryotic genomes, suggesting that ancient prokaryotic protein-coding genes might have contained introns. The greatly varied usages of specific dinucleotides at different codon positions are considered evolutionary accommodations to compensate the unavailability of specific codons and to avoid formation of pre-mature stop codons. This is the first report presenting data of dinucleotide relative abundance to indicate the possible existence of spliceosomal introns in ancient prokaryotic genes and to hypothesize early steps of de novo gene birth.
KeywordsDinucleotide Composition Odds ratio Gene birth Genome evolution
This study was supported by the National Natural Science Foundation of China (No. 31572467 and No. 31872425) and the Project Funded by Priority Academic Program Development (PAPD) of Jiangsu Higher Education Institutions.
Compliance with ethical standards
Conflict of interest
All authors declare that they have no conflict of interest.
Research involving human or animal participants
This article does not contain any studies with human participants or animals performed by any of the authors.
- Wang Y, Tao XF, Su ZX, Liu AK, Liu TL, Sun L, Yao Q, Chen KP, Gu X (2016) Current bacterial gene encoding capsule biosynthesis protein CapI contains nucleotides derived from exonization. Evol Bioinform 12:303–312Google Scholar