Skip to main content
Log in

Codon selection reduces GC content bias in nucleic acids encoding for intrinsically disordered proteins

  • Original Article
  • Published:
Cellular and Molecular Life Sciences Aims and scope Submit manuscript

Abstract

Protein-coding nucleic acids exhibit composition and codon biases between sequences coding for intrinsically disordered regions (IDRs) and those coding for structured regions. IDRs are regions of proteins that are folding self-insufficient and which function without the prerequisite of folded structure. Several authors have investigated composition bias or codon selection in regions encoding for IDRs, primarily in Eukaryota, and concluded that elevated GC content is the result of the biased amino acid composition of IDRs. We substantively extend previous work by examining GC content in regions encoding IDRs, from 44 species in Eukaryota, Archaea, and Bacteria, spanning a wide range of GC content. We confirm that regions coding for IDRs show a significantly elevated GC content, even across all domains of life. Although this is largely attributable to the amino acid composition bias of IDRs, we show that this bias is independent of the overall GC content and, most importantly, we are the first to observe that GC content bias in IDRs is significantly different than expected from IDR amino acid composition alone. We empirically find compensatory codon selection that reduces the observed GC content bias in IDRs. This selection is dependent on the overall GC content of the organism. The codon selection bias manifests as use of infrequent, AT-rich codons in encoding IDRs. Further, we find these relationships to be independent of the intrinsic disorder prediction method used, and independent of estimated translation efficiency. These observations are consistent with the previous work, and we speculate on whether the observed biases are causal or symptomatic of other driving forces.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Dunker AK, Obradovic Z (2001) The protein trinity-linking function and disorder. Nat Biotechnol 19:805–806

    CAS  PubMed  Google Scholar 

  2. Wright PE, Dyson HJ (1999) Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. J Mol Biol 293:321–331

    CAS  PubMed  Google Scholar 

  3. Uversky VN, Gillespie JR, Fink AL (2000) Why are “natively unfolded” proteins unstructured under physiologic conditions? Proteins 41:415–427

    CAS  PubMed  Google Scholar 

  4. Dyson HJ, Wright PE (2005) Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol 6:197–208

    CAS  PubMed  Google Scholar 

  5. Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradovic Z (2002) Intrinsic disorder and protein function. Biochemistry 41:6573–6582

    CAS  PubMed  Google Scholar 

  6. Xie H, Vucetic S, Iakoucheva LM, Oldfield CJ, Dunker AK, Uversky VN, Obradovic Z (2007) Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions. J Proteome Res 6:1882–1898

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Peng Z et al (2015) Exceptionally abundant exceptions: comprehensive characterization of intrinsic disorder in all domains of life. Cell Mol Life Sci 72:137–151

    CAS  PubMed  Google Scholar 

  8. Peng Z, Mizianty MJ, Kurgan L (2014) Genome-scale prediction of proteins with long intrinsically disordered regions. Proteins 82:145–158

    CAS  PubMed  Google Scholar 

  9. Xue B, Dunker AK, Uversky VN (2012) Orderly order in protein intrinsic disorder distribution: disorder in 3500 proteomes from viruses and the three domains of life. J Biomol Struct Dyn 30:137–149

    CAS  PubMed  Google Scholar 

  10. Pancsa R, Tompa P (2012) Structural disorder in eukaryotes. PLoS One 7:e34687

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT (2004) Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol 337:635–645

    CAS  PubMed  Google Scholar 

  12. Tompa P (2012) Intrinsically disordered proteins: a 10-year recap. Trends Biochem Sci 37:509–516

    CAS  PubMed  Google Scholar 

  13. Dosztanyi Z, Csizmok V, Tompa P, Simon I (2005) The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J Mol Biol 347:827–839

    CAS  PubMed  Google Scholar 

  14. Walsh I, Martin AJ, Di Domenico T, Tosatto SC (2012) ESpritz: accurate and fast prediction of protein disorder. Bioinformatics 28:503–509

    CAS  PubMed  Google Scholar 

  15. Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, Dunker AK (2001) Sequence complexity of disordered protein. Proteins 42:38–48

    CAS  PubMed  Google Scholar 

  16. Peng K, Radivojac P, Vucetic S, Dunker AK, Obradovic Z (2006) Length-dependent prediction of protein intrinsic disorder. BMC Bioinform 7:208

    Google Scholar 

  17. Meng F, Uversky VN, Kurgan L (2017) Comprehensive review of methods for prediction of intrinsic disorder and its molecular functions. Cell Mol Life Sci 74:3069–3090

    CAS  PubMed  Google Scholar 

  18. Lieutaud P, Ferron F, Uversky AV, Kurgan L, Uversky VN, Longhi S (2016) How disordered is my protein and what is its disorder for? A guide through the “dark side” of the protein universe. Intrinsically Disord Proteins 4:e1259708

    PubMed  PubMed Central  Google Scholar 

  19. Romero PR et al (2006) Alternative splicing in concert with protein intrinsic disorder enables increased functional diversity in multicellular organisms. Proc Natl Acad Sci USA 103:8390–8395

    CAS  PubMed  Google Scholar 

  20. Homma K, Noguchi T, Fukuchi S (2016) Codon usage is less optimized in eukaryotic gene segments encoding intrinsically disordered regions than in those encoding structural domains. Nucleic Acids Res 44:10051–10061

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Zhou M, Wang T, Fu J, Xiao G, Liu Y (2015) Nonoptimal codon usage influences protein structure in intrinsically disordered regions. Mol Microbiol 97:974–987

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Peng Z, Uversky VN, Kurgan L (2016) Genes encoding intrinsic disorder in Eukaryota have high GC content. Intrinsically Disord Proteins 4:e1262225

    PubMed  PubMed Central  Google Scholar 

  23. Basile W, Sachenkova O, Light S, Elofsson A (2017) High GC content causes orphan proteins to be intrinsically disordered. PLoS Comput Biol 13:e1005375

    PubMed  PubMed Central  Google Scholar 

  24. Yruela I, Contreras-Moreira B (2013) Genetic recombination is associated with intrinsic disorder in plant proteomes. BMC Genom 14:772

    CAS  Google Scholar 

  25. Pavlovic-Lazetic GM, Mitic NS, Kovacevic JJ, Obradovic Z, Malkov SN, Beljanski MV (2011) Bioinformatics analysis of disordered proteins in prokaryotes. BMC Bioinform 12:66

    CAS  Google Scholar 

  26. Bernardi G (1993) The vertebrate genome: isochores and evolution. Mol Biol Evol 10:186–204

    CAS  PubMed  Google Scholar 

  27. Yin H, Wang G, Ma L, Yi SV, Zhang Z (2016) What signatures dominantly associate with gene age? Genome Biol Evol 8:3083–3089

    PubMed  PubMed Central  Google Scholar 

  28. Amit M et al (2012) Differential GC content between exons and introns establishes distinct strategies of splice-site recognition. Cell Rep 1:543–556

    CAS  PubMed  Google Scholar 

  29. Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, Dunker AK (2001) Sequence complexity of disordered protein. Proteins Struct Funct Bioinform 42:38–48

    CAS  Google Scholar 

  30. Cannarozzi G et al (2010) A role for codon order in translation dynamics. Cell 141:355–367

    PubMed  Google Scholar 

  31. Pruitt KD et al (2009) The consensus coding sequence (CCDS) project: identifying a common protein-coding gene set for the human and mouse genomes. Genome Res 19:1316–1323

    CAS  PubMed  PubMed Central  Google Scholar 

  32. UniProt Consortium (2012) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 40:D71–D75

    Google Scholar 

  33. Kanz C et al (2005) The EMBL nucleotide sequence database. Nucleic Acids Res 33:D29–D33

    CAS  PubMed  Google Scholar 

  34. Peng ZL, Kurgan L (2012) Comprehensive comparative assessment of in silico predictors of disordered regions. Curr Protein Pept Sci 13:6–18

    CAS  PubMed  Google Scholar 

  35. Walsh I, Giollo M, Di Domenico T, Ferrari C, Zimmermann O, Tosatto SC (2015) Comprehensive large-scale assessment of intrinsic protein disorder. Bioinformatics 31:201–208

    CAS  PubMed  Google Scholar 

  36. Piovesan D et al (2016) DisProt 7.0: a major update of the database of disordered proteins. Nucleic Acids Res D1:D219–D227

    Google Scholar 

  37. Peng, Z. and Kurgan, L. (2012). On the complementarity of the consensus-based disorder prediction. In: Pacific symposium on biocomputing, pp 176–187

  38. Fan X, Kurgan L (2014) Accurate prediction of disorder in protein chains with a comprehensive and empirically designed consensus. J Biomol Struct Dyn 32:448–464

    CAS  PubMed  Google Scholar 

  39. Na I, Meng F, Kurgan L, Uversky VN (2016) Autophagy-related intrinsically disordered proteins in intra-nuclear compartments. Mol BioSyst 12:2798–2817

    CAS  PubMed  Google Scholar 

  40. Meng F, Na I, Kurgan L, Uversky VN (2016) Compartmentalization and functionality of nuclear disorder: intrinsic disorder and protein–protein interactions in intra-nuclear compartments. Int J Mol Sci 17:24

    Google Scholar 

  41. Peng Z, Oldfield CJ, Xue B, Mizianty MJ, Dunker AK, Kurgan L, Uversky VN (2014) A creature with a hundred waggly tails: intrinsically disordered proteins in the ribosome. Cell Mol Life Sci 71:1477–1504

    CAS  PubMed  Google Scholar 

  42. Hu G, Wu Z, Wang K, Uversky VN, Kurgan L (2016) Untapped potential of disordered proteins in current druggable human proteome. Curr Drug Targets 17:1198–1205

    CAS  PubMed  Google Scholar 

  43. Wang C, Uversky VN, Kurgan L (2016) Disordered nucleiome: abundance of intrinsic disorder in the DNA- and RNA-binding proteins in 1121 species from Eukaryota, Bacteria and Archaea. Proteomics 16:1486–1498

    CAS  PubMed  Google Scholar 

  44. Di Domenico T, Walsh I, Martin AJM, Tosatto SCE (2012) MobiDB: a comprehensive database of intrinsic protein disorder annotations. Bioinformatics 28:2080–2081

    PubMed  Google Scholar 

  45. Potenza E, Di Domenico T, Walsh I, Tosatto SC (2015) MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins. Nucleic Acids Res 43:D315–D320

    CAS  PubMed  Google Scholar 

  46. Oates ME et al (2013) D(2)P(2): database of disordered protein predictions. Nucleic Acids Res 41:D508–D516

    CAS  PubMed  Google Scholar 

  47. Li X, Romero P, Rani M, Dunker AK, Obradovic Z (1999) Predicting protein disorder for N-, C-, and internal regions. Genome Inform Ser Workshop Genome Inform 10:30–40

    CAS  PubMed  Google Scholar 

  48. Bennetzen JL, Hall BD (1982) Codon selection in yeast. J Biol Chem 257:3026–3031

    CAS  PubMed  Google Scholar 

  49. Friberg MT, Gonnet P, Barral Y, Schraudolph NN, Gonnet GH (2006) Measures of codon bias in yeast, the tRNA pairing index and possible DNA repair mechanisms. In: Bucher P, Moret B (eds) Algorithms in bioinformatics. WABI 2006. Lecture Notes in Computer Science, vol 4175. Springer, Berlin, Heidelberg

    Google Scholar 

  50. Guo F-B, Ye Y-N, Zhao H-L, Lin D, Wei W (2012) Universal pattern and diverse strengths of successive synonymous codon bias in three domains of life, particularly among prokaryotic genomes. DNA Res Int J Rapid Publ Rep Genes Genomes 19:477–485

    CAS  Google Scholar 

  51. Reis MD, Savva R, Wernisch L (2004) Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res 32:5036–5044

    PubMed  PubMed Central  Google Scholar 

  52. Novoa EM, Ribas de Pouplana L (2012) Speeding with control: codon usage, tRNAs, and ribosomes. Trends Genet 28:574–581

    CAS  PubMed  Google Scholar 

  53. Petersen J, Eriksson SK, Harryson P, Pierog S, Colby T, Bartels D, Rohrig H (2012) The lysine-rich motif of intrinsically disordered stress protein CDeT11-24 from Craterostigma plantagineum is responsible for phosphatidic acid binding and protection of enzymes from damaging effects caused by desiccation. J Exp Bot 63:4919–4929

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Botting CH, Talbot P, Paytubi S, White MF (2010) Extensive lysine methylation in hyperthermophilic crenarchaea: potential implications for protein stability and recombinant enzymes. Archaea 2010:106341

    PubMed  PubMed Central  Google Scholar 

  55. Varadi M, Zsolyomi F, Guharoy M, Tompa P (2015) Functional advantages of conserved intrinsic disorder in RNA-binding proteins. PLoS One 10:e0139731

    PubMed  PubMed Central  Google Scholar 

  56. Uversky VN (2017) Protein intrinsic disorder-based liquid–liquid phase transitions in biological systems: complex coacervates and membrane-less organelles. Adv Colloid Interface Sci 239:97–114

    CAS  PubMed  Google Scholar 

  57. Siddiqui KS, Cavicchioli R (2006) Cold-adapted enzymes. Annu Rev Biochem 75:403–433

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This research was supported in part by the National Science Foundation Grant 1617369 and the Robert J. Mattauch Endowment from Virginia Commonwealth University to L.K.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Christopher J. Oldfield or Lukasz Kurgan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 1354 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oldfield, C.J., Peng, Z., Uversky, V.N. et al. Codon selection reduces GC content bias in nucleic acids encoding for intrinsically disordered proteins. Cell. Mol. Life Sci. 77, 149–160 (2020). https://doi.org/10.1007/s00018-019-03166-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00018-019-03166-6

Keywords

Navigation