Sequences downstream of the start codon and their relations to G + C content and optimal growth temperature in prokaryotic genomes
- 90 Downloads
The mechanism of translation initiation is responsible for shaping the mRNA sequences downstream of the start codon. However, this region has not been systematically analyzed in prokaryotes. We used sequence logos and statistic methods to analyze the patterns of overrepresented sequences in this region for 125 species of bacteria and 23 species of archaea. The specific positions are compared to the first 33 amino acids in the proteins. At the 2nd amino acid position, Lys, Ser or Thr is highly overrepresented for 68% to 84% of the genomes examined and Ala is highly overrepresented for 57% of the genomes. Overrepresentation of Lys2 is negatively correlated with the G + C content and overrepresentation of Ser2 or Thr2 is positively correlated with the G + C content of genomes. Ile at the 4th to the 8th positions were found to be overrepresented for 91% of the genomes analyzed and this seemed to be conserved for both bacteria and archaea. Organisms growing at high temperatures have relatively low extent of nucleotides bias at 5′ termini of open reading frames (ORFs). The extent of overrepresenting A and underrepresenting G at ORF 5′ termini is reduced in thermophiles and hyperthermophiles for both archaea and bacteria.
KeywordsG + C content Optimal growth temperature Sequence logos Translation initiation
Open reading frames
This work was supported by a grant from the National Natural Science Foundation of China (NSFC No. 30200005).