Codon selection reduces GC content bias in nucleic acids encoding for intrinsically disordered proteins

  • 222 Accesses


Protein-coding nucleic acids exhibit composition and codon biases between sequences coding for intrinsically disordered regions (IDRs) and those coding for structured regions. IDRs are regions of proteins that are folding self-insufficient and which function without the prerequisite of folded structure. Several authors have investigated composition bias or codon selection in regions encoding for IDRs, primarily in Eukaryota, and concluded that elevated GC content is the result of the biased amino acid composition of IDRs. We substantively extend previous work by examining GC content in regions encoding IDRs, from 44 species in Eukaryota, Archaea, and Bacteria, spanning a wide range of GC content. We confirm that regions coding for IDRs show a significantly elevated GC content, even across all domains of life. Although this is largely attributable to the amino acid composition bias of IDRs, we show that this bias is independent of the overall GC content and, most importantly, we are the first to observe that GC content bias in IDRs is significantly different than expected from IDR amino acid composition alone. We empirically find compensatory codon selection that reduces the observed GC content bias in IDRs. This selection is dependent on the overall GC content of the organism. The codon selection bias manifests as use of infrequent, AT-rich codons in encoding IDRs. Further, we find these relationships to be independent of the intrinsic disorder prediction method used, and independent of estimated translation efficiency. These observations are consistent with the previous work, and we speculate on whether the observed biases are causal or symptomatic of other driving forces.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 199

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

    Dunker AK, Obradovic Z (2001) The protein trinity-linking function and disorder. Nat Biotechnol 19:805–806

  2. 2.

    Wright PE, Dyson HJ (1999) Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. J Mol Biol 293:321–331

  3. 3.

    Uversky VN, Gillespie JR, Fink AL (2000) Why are “natively unfolded” proteins unstructured under physiologic conditions? Proteins 41:415–427

  4. 4.

    Dyson HJ, Wright PE (2005) Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol 6:197–208

  5. 5.

    Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradovic Z (2002) Intrinsic disorder and protein function. Biochemistry 41:6573–6582

  6. 6.

    Xie H, Vucetic S, Iakoucheva LM, Oldfield CJ, Dunker AK, Uversky VN, Obradovic Z (2007) Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions. J Proteome Res 6:1882–1898

  7. 7.

    Peng Z et al (2015) Exceptionally abundant exceptions: comprehensive characterization of intrinsic disorder in all domains of life. Cell Mol Life Sci 72:137–151

  8. 8.

    Peng Z, Mizianty MJ, Kurgan L (2014) Genome-scale prediction of proteins with long intrinsically disordered regions. Proteins 82:145–158

  9. 9.

    Xue B, Dunker AK, Uversky VN (2012) Orderly order in protein intrinsic disorder distribution: disorder in 3500 proteomes from viruses and the three domains of life. J Biomol Struct Dyn 30:137–149

  10. 10.

    Pancsa R, Tompa P (2012) Structural disorder in eukaryotes. PLoS One 7:e34687

  11. 11.

    Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT (2004) Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol 337:635–645

  12. 12.

    Tompa P (2012) Intrinsically disordered proteins: a 10-year recap. Trends Biochem Sci 37:509–516

  13. 13.

    Dosztanyi Z, Csizmok V, Tompa P, Simon I (2005) The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J Mol Biol 347:827–839

  14. 14.

    Walsh I, Martin AJ, Di Domenico T, Tosatto SC (2012) ESpritz: accurate and fast prediction of protein disorder. Bioinformatics 28:503–509

  15. 15.

    Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, Dunker AK (2001) Sequence complexity of disordered protein. Proteins 42:38–48

  16. 16.

    Peng K, Radivojac P, Vucetic S, Dunker AK, Obradovic Z (2006) Length-dependent prediction of protein intrinsic disorder. BMC Bioinform 7:208

  17. 17.

    Meng F, Uversky VN, Kurgan L (2017) Comprehensive review of methods for prediction of intrinsic disorder and its molecular functions. Cell Mol Life Sci 74:3069–3090

  18. 18.

    Lieutaud P, Ferron F, Uversky AV, Kurgan L, Uversky VN, Longhi S (2016) How disordered is my protein and what is its disorder for? A guide through the “dark side” of the protein universe. Intrinsically Disord Proteins 4:e1259708

  19. 19.

    Romero PR et al (2006) Alternative splicing in concert with protein intrinsic disorder enables increased functional diversity in multicellular organisms. Proc Natl Acad Sci USA 103:8390–8395

  20. 20.

    Homma K, Noguchi T, Fukuchi S (2016) Codon usage is less optimized in eukaryotic gene segments encoding intrinsically disordered regions than in those encoding structural domains. Nucleic Acids Res 44:10051–10061

  21. 21.

    Zhou M, Wang T, Fu J, Xiao G, Liu Y (2015) Nonoptimal codon usage influences protein structure in intrinsically disordered regions. Mol Microbiol 97:974–987

  22. 22.

    Peng Z, Uversky VN, Kurgan L (2016) Genes encoding intrinsic disorder in Eukaryota have high GC content. Intrinsically Disord Proteins 4:e1262225

  23. 23.

    Basile W, Sachenkova O, Light S, Elofsson A (2017) High GC content causes orphan proteins to be intrinsically disordered. PLoS Comput Biol 13:e1005375

  24. 24.

    Yruela I, Contreras-Moreira B (2013) Genetic recombination is associated with intrinsic disorder in plant proteomes. BMC Genom 14:772

  25. 25.

    Pavlovic-Lazetic GM, Mitic NS, Kovacevic JJ, Obradovic Z, Malkov SN, Beljanski MV (2011) Bioinformatics analysis of disordered proteins in prokaryotes. BMC Bioinform 12:66

  26. 26.

    Bernardi G (1993) The vertebrate genome: isochores and evolution. Mol Biol Evol 10:186–204

  27. 27.

    Yin H, Wang G, Ma L, Yi SV, Zhang Z (2016) What signatures dominantly associate with gene age? Genome Biol Evol 8:3083–3089

  28. 28.

    Amit M et al (2012) Differential GC content between exons and introns establishes distinct strategies of splice-site recognition. Cell Rep 1:543–556

  29. 29.

    Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, Dunker AK (2001) Sequence complexity of disordered protein. Proteins Struct Funct Bioinform 42:38–48

  30. 30.

    Cannarozzi G et al (2010) A role for codon order in translation dynamics. Cell 141:355–367

  31. 31.

    Pruitt KD et al (2009) The consensus coding sequence (CCDS) project: identifying a common protein-coding gene set for the human and mouse genomes. Genome Res 19:1316–1323

  32. 32.

    UniProt Consortium (2012) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 40:D71–D75

  33. 33.

    Kanz C et al (2005) The EMBL nucleotide sequence database. Nucleic Acids Res 33:D29–D33

  34. 34.

    Peng ZL, Kurgan L (2012) Comprehensive comparative assessment of in silico predictors of disordered regions. Curr Protein Pept Sci 13:6–18

  35. 35.

    Walsh I, Giollo M, Di Domenico T, Ferrari C, Zimmermann O, Tosatto SC (2015) Comprehensive large-scale assessment of intrinsic protein disorder. Bioinformatics 31:201–208

  36. 36.

    Piovesan D et al (2016) DisProt 7.0: a major update of the database of disordered proteins. Nucleic Acids Res D1:D219–D227

  37. 37.

    Peng, Z. and Kurgan, L. (2012). On the complementarity of the consensus-based disorder prediction. In: Pacific symposium on biocomputing, pp 176–187

  38. 38.

    Fan X, Kurgan L (2014) Accurate prediction of disorder in protein chains with a comprehensive and empirically designed consensus. J Biomol Struct Dyn 32:448–464

  39. 39.

    Na I, Meng F, Kurgan L, Uversky VN (2016) Autophagy-related intrinsically disordered proteins in intra-nuclear compartments. Mol BioSyst 12:2798–2817

  40. 40.

    Meng F, Na I, Kurgan L, Uversky VN (2016) Compartmentalization and functionality of nuclear disorder: intrinsic disorder and protein–protein interactions in intra-nuclear compartments. Int J Mol Sci 17:24

  41. 41.

    Peng Z, Oldfield CJ, Xue B, Mizianty MJ, Dunker AK, Kurgan L, Uversky VN (2014) A creature with a hundred waggly tails: intrinsically disordered proteins in the ribosome. Cell Mol Life Sci 71:1477–1504

  42. 42.

    Hu G, Wu Z, Wang K, Uversky VN, Kurgan L (2016) Untapped potential of disordered proteins in current druggable human proteome. Curr Drug Targets 17:1198–1205

  43. 43.

    Wang C, Uversky VN, Kurgan L (2016) Disordered nucleiome: abundance of intrinsic disorder in the DNA- and RNA-binding proteins in 1121 species from Eukaryota, Bacteria and Archaea. Proteomics 16:1486–1498

  44. 44.

    Di Domenico T, Walsh I, Martin AJM, Tosatto SCE (2012) MobiDB: a comprehensive database of intrinsic protein disorder annotations. Bioinformatics 28:2080–2081

  45. 45.

    Potenza E, Di Domenico T, Walsh I, Tosatto SC (2015) MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins. Nucleic Acids Res 43:D315–D320

  46. 46.

    Oates ME et al (2013) D(2)P(2): database of disordered protein predictions. Nucleic Acids Res 41:D508–D516

  47. 47.

    Li X, Romero P, Rani M, Dunker AK, Obradovic Z (1999) Predicting protein disorder for N-, C-, and internal regions. Genome Inform Ser Workshop Genome Inform 10:30–40

  48. 48.

    Bennetzen JL, Hall BD (1982) Codon selection in yeast. J Biol Chem 257:3026–3031

  49. 49.

    Friberg MT, Gonnet P, Barral Y, Schraudolph NN, Gonnet GH (2006) Measures of codon bias in yeast, the tRNA pairing index and possible DNA repair mechanisms. In: Bucher P, Moret B (eds) Algorithms in bioinformatics. WABI 2006. Lecture Notes in Computer Science, vol 4175. Springer, Berlin, Heidelberg

  50. 50.

    Guo F-B, Ye Y-N, Zhao H-L, Lin D, Wei W (2012) Universal pattern and diverse strengths of successive synonymous codon bias in three domains of life, particularly among prokaryotic genomes. DNA Res Int J Rapid Publ Rep Genes Genomes 19:477–485

  51. 51.

    Reis MD, Savva R, Wernisch L (2004) Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res 32:5036–5044

  52. 52.

    Novoa EM, Ribas de Pouplana L (2012) Speeding with control: codon usage, tRNAs, and ribosomes. Trends Genet 28:574–581

  53. 53.

    Petersen J, Eriksson SK, Harryson P, Pierog S, Colby T, Bartels D, Rohrig H (2012) The lysine-rich motif of intrinsically disordered stress protein CDeT11-24 from Craterostigma plantagineum is responsible for phosphatidic acid binding and protection of enzymes from damaging effects caused by desiccation. J Exp Bot 63:4919–4929

  54. 54.

    Botting CH, Talbot P, Paytubi S, White MF (2010) Extensive lysine methylation in hyperthermophilic crenarchaea: potential implications for protein stability and recombinant enzymes. Archaea 2010:106341

  55. 55.

    Varadi M, Zsolyomi F, Guharoy M, Tompa P (2015) Functional advantages of conserved intrinsic disorder in RNA-binding proteins. PLoS One 10:e0139731

  56. 56.

    Uversky VN (2017) Protein intrinsic disorder-based liquid–liquid phase transitions in biological systems: complex coacervates and membrane-less organelles. Adv Colloid Interface Sci 239:97–114

  57. 57.

    Siddiqui KS, Cavicchioli R (2006) Cold-adapted enzymes. Annu Rev Biochem 75:403–433

Download references


This research was supported in part by the National Science Foundation Grant 1617369 and the Robert J. Mattauch Endowment from Virginia Commonwealth University to L.K.

Author information

Correspondence to Christopher J. Oldfield or Lukasz Kurgan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 1354 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Oldfield, C.J., Peng, Z., Uversky, V.N. et al. Codon selection reduces GC content bias in nucleic acids encoding for intrinsically disordered proteins. Cell. Mol. Life Sci. 77, 149–160 (2020).

Download citation


  • Intrinsically disordered proteins
  • Amino acid composition
  • GC content
  • Codon selection