Genes & Genomics

, Volume 37, Issue 4, pp 365–374 | Cite as

Lengths of coding and noncoding regions of a gene correlate with gene essentiality and rates of evolution

  • Seung-Ho Shin
  • Sun Shim Choi
Research Article


Gene length correlates with the coding evolutionary rates of genes. Although the ‘Hill-Robertson (HR) interference’ model was suggested as an explanation for the correlation, we present an alternative explanation for the relationship between gene length and evolutionary rate. First, genes with longer coding sequences were significantly more essential and evolved more slowly than genes with shorter CDSs, and they contained more functional domains within the gene. Surprisingly, the same trends held for the lengths of other subcomponents; genes that carried longer 5′ and 3′ untranslated regions and introns were more essential. Additionally, the noncoding subcomponents that had higher densities of conserved sites were longer. Furthermore, the density of conserved sites in a coding region of a gene was associated with the density of conserved sites in the noncoding regions of that gene. Finally, in all five vertebrate species that were tested, more functionally constrained genes tended to carry longer subcomponents.


Gene length Evolutionary rate Gene essentiality Functional constraint 



This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF-2014R1A1A4A01003793). Authors would like to thank Sridhar Hannenhalli for his useful comments.

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary material

13258_2015_265_MOESM1_ESM.doc (178 kb)
Supplementary material 1 (DOC 178 kb)


  1. Aloy P, Russell RB (2004) Ten thousand interactions for the molecular biologist. Nat Biotechnol 22:1317–1321CrossRefPubMedGoogle Scholar
  2. Alvarez-Ponce D (2012) The relationship between the hierarchical position of proteins in the human signal transduction network and their rate of evolution. BMC Evol Biol 12:192PubMedCentralCrossRefPubMedGoogle Scholar
  3. Barash Y, Calarco JA, Gao W, Pan Q, Wang X, Shai O, Blencowe BJ, Frey BJ (2010) Deciphering the splicing code. Nature 465:53–59CrossRefPubMedGoogle Scholar
  4. Batada NN, Hurst LD, Tyers M (2006) Evolutionary and physiological importance of hub proteins. PLoS Comput Biol 2:e88PubMedCentralCrossRefPubMedGoogle Scholar
  5. Betancourt AJ, Welch JJ, Charlesworth B (2009) Reduced effectiveness of selection caused by a lack of recombination. Curr Biol 19:655–660CrossRefPubMedGoogle Scholar
  6. Brocchieri L, Karlin S (2005) Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Res 33:3390–3400PubMedCentralCrossRefPubMedGoogle Scholar
  7. Campos JL, Charlesworth B, Haddrill PR (2012) Molecular evolution in nonrecombining regions of the Drosophila melanogaster genome. Genome Biol Evol 4:278–288PubMedCentralCrossRefPubMedGoogle Scholar
  8. Carvalho AB, Clark AG (1999) Genetic recombination: intron size and natural selection. Nature 401:344CrossRefPubMedGoogle Scholar
  9. Castillo-Davis CI, Hartl DL (2003) Conservation, relocation and duplication in genome evolution. Trends Genet 19:593–597CrossRefPubMedGoogle Scholar
  10. Charlesworth B, Betancourt AJ, Kaiser VB, Gordo I (2009) Genetic recombination and molecular evolution. Cold Spring Harb Symp Quant Biol 74:177–186CrossRefPubMedGoogle Scholar
  11. Chen Y, Xu D (2005) Understanding protein dispensability through machine-learning analysis of high-throughput data. Bioinformatics 21:575–581CrossRefPubMedGoogle Scholar
  12. Choi SS, Hannenhalli S (2013) Three independent determinants of protein eovlutionary rate. J Mol Evol 76:98–111Google Scholar
  13. Comeron JM, Kreitman M (2002) Population, evolutionary and genomic consequences of interference selection. Genetics 161:389–410PubMedCentralPubMedGoogle Scholar
  14. Comeron JM, Kreitman M, Aguade M (1999) Natural selection on synonymous sites is correlated with gene length and recombination in Drosophila. Genetics 151:239–249PubMedCentralPubMedGoogle Scholar
  15. Deng M, Mehta S, Sun F, Chen T (2002) Inferring domain-domain interactions from protein-protein interactions. Genome Res 12:1540–1548PubMedCentralCrossRefPubMedGoogle Scholar
  16. Deutschbauer AM, Jaramillo DF, Proctor M, Kumm J, Hillenmeyer ME, Davis RW, Nislow C, Giaever G (2005) Mechanisms of haploinsufficiency revealed by genome-wide profiling in yeast. Genetics 169:1915–1925Google Scholar
  17. Duret L, Mouchiroud D (1999) Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc Natl Acad Sci USA 96:4482–4487PubMedCentralCrossRefPubMedGoogle Scholar
  18. Duret L, Mouchiroud D (2000) Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol Biol Evol 17:68–74CrossRefPubMedGoogle Scholar
  19. Eisenberg E, Levanon EY (2003) Human housekeeping genes are compact. Trends Genet 19:362–365CrossRefPubMedGoogle Scholar
  20. Fedorova L, Fedorov A (2003) Introns in gene evolution. 123-131Google Scholar
  21. Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW (2002) Evolutionary rate in the protein interaction network. Science 296:750–752CrossRefPubMedGoogle Scholar
  22. Fraser HB, Wall DP, Hirsh AE (2003) A simple dependence between protein evolution rate and the number of protein-protein interactions. BMC Evol Biol 3:11PubMedCentralCrossRefPubMedGoogle Scholar
  23. Fujimoto A, Nakagawa H, Hosono N, Nakano K, Abe T, Boroevich KA, Nagasaki M, Yamaguchi R, Shibuya T, Kubo M (2010) Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing. Nat Genet 42:931–936CrossRefPubMedGoogle Scholar
  24. Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL (2007) The human disease network. Proc Natl Acad Sci U S A 104:8685–8690PubMedCentralCrossRefPubMedGoogle Scholar
  25. Hahn MW, Kern AD (2005) Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol Biol Evol 22:803–806CrossRefPubMedGoogle Scholar
  26. Hahn MW, Conant GC, Wagner A (2004) Molecular evolution in large genetic networks: does connectivity equal constraint? J Mol Evol 58:203–211CrossRefPubMedGoogle Scholar
  27. Hao L, Ge X, Wan H, Hu S, Lercher MJ, Yu J, Chen WH (2010) Human functional genetic studies are biased against the medically most relevant primate-specific genes. BMC Evol Biol 10:316PubMedCentralCrossRefPubMedGoogle Scholar
  28. Hirsh AE, Fraser HB (2001) Protein dispensability and rate of evolution. Nature 411:1046–1049CrossRefPubMedGoogle Scholar
  29. Hudson RR (1994) How can the low levels of DNA sequence variation in regions of the drosophila genome with low recombination rates be explained? Proc Natl Acad Sci USA 91:6815–6818PubMedCentralCrossRefPubMedGoogle Scholar
  30. Hudson CM, Conant GC (2011) Expression level, cellular compartment and metabolic network position all influence the average selective constraint on mammalian enzymes. BMC Evol Biol 11:89PubMedCentralCrossRefPubMedGoogle Scholar
  31. Hurst LD, Smith NG (1999) Do essential genes evolve slowly? Curr Biol 9:747–750CrossRefPubMedGoogle Scholar
  32. Jeong H, Mason SP, Barabási A, Oltvai ZN (2001) Lethality and centrality in protein networks. Nature 411:41–42CrossRefPubMedGoogle Scholar
  33. Jordan IK, Rogozin IB, Wolf YI, Koonin EV (2002) Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res 12:962–968PubMedCentralCrossRefPubMedGoogle Scholar
  34. Jovelin R, Phillips PC (2009) Evolutionary rates and centrality in the yeast gene regulatory network. Genome Biol 10:R35PubMedCentralCrossRefPubMedGoogle Scholar
  35. Keane TM, Creevey CJ, Pentony MM, Naughton TJ, Mclnerney JO (2006) Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol Biol 6:29PubMedCentralCrossRefPubMedGoogle Scholar
  36. Kliman RM, Hey J (1993) Reduced natural selection associated with low recombination in Drosophila melanogaster. Mol Biol Evol 10:1239–1258PubMedGoogle Scholar
  37. Kong A, Thorleifsson G, Gudbjartsson DF, Masson G, Sigurdsson A, Jonasdottir A, Walters GB, Jonasdottir A, Gylfason A, Kristinsson KT (2010) Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467:1099–1103CrossRefPubMedGoogle Scholar
  38. Koonin EV (2005) Systemic determinants of gene evolution and function. Mol Syst Biol 1Google Scholar
  39. Koonin EV, Wolf YI (2006) Evolutionary systems biology: links between gene evolution and function. Curr Opin Biotechnol 17:481–487CrossRefPubMedGoogle Scholar
  40. Krylov DM, Wolf YI, Rogozin IB, Koonin EV (2003) Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res 13:2229–2235PubMedCentralCrossRefPubMedGoogle Scholar
  41. Lemos B, Bettencourt BR, Meiklejohn CD, Hartl DL (2005) Evolution of proteins and gene expression levels are coupled in Drosophila and are independently associated with mRNA abundance, protein length, and number of protein-protein interactions. Mol Biol Evol 22:1345–1354CrossRefPubMedGoogle Scholar
  42. Li W (1993) Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J Mol Evol 36:96–99CrossRefPubMedGoogle Scholar
  43. Liao BY, Scott NM, Zhang J (2006) Impacts of gene essentiality, expression pattern, and gene compactness on the evolutionary rate of mammalian proteins. Mol Biol Evol 23:2072–2080CrossRefPubMedGoogle Scholar
  44. Lipman DJ, Souvorov A, Koonin EV, Panchenko AR, Tatusova TA (2002) The relationship of protein conservation and sequence length. BMC Evol Biol 2:20PubMedCentralCrossRefPubMedGoogle Scholar
  45. Makino T, Gojobori T (2006) The evolutionary rate of a protein is influenced by features of the interacting partners. Mol Biol Evol 23:784–789CrossRefPubMedGoogle Scholar
  46. Marais G, Mouchiroud D, Duret L (2001) Does recombination improve selection on codon usage? Lessons from nematode and fly complete genomes. Proc Natl Acad Sci USA 98:5688–5692PubMedCentralCrossRefPubMedGoogle Scholar
  47. Marais G, Nouvellet P, Keightley PD, Charlesworth B (2005) Intron size and exon evolution in Drosophila. Genetics 170:481–485PubMedCentralCrossRefPubMedGoogle Scholar
  48. Pál C, Papp B, Lercher MJ (2006) An integrated view of protein evolution. Nat Rev Genet 7:337–348CrossRefPubMedGoogle Scholar
  49. Pamilo P, Bianchi NO (1993) Evolution of the Zfx and Zfy genes: rates and interdependence between the genes. Mol Biol Evol 10:271–281PubMedGoogle Scholar
  50. Paquet ER, Rey G, Naef F (2008) Modeling an evolutionary conserved circadian cis-element. PLoS Comput Biol 4:e38PubMedCentralCrossRefPubMedGoogle Scholar
  51. Park K, Kim D (2009) Localized network centrality and essentiality in the yeast–protein interaction network. Proteomics 9:5143–5154CrossRefPubMedGoogle Scholar
  52. Pereira-Leal JB, Audit B, Peregrin-Alvarez JM, Ouzounis CA (2005) An exponential core in the heart of the yeast protein interaction network. Mol Biol Evol 22:421–425CrossRefPubMedGoogle Scholar
  53. Powell JR, Moriyama EN (1997) Evolution of codon usage bias in Drosophila. Proc Natl Acad Sci USA 94:7784–7790PubMedCentralCrossRefPubMedGoogle Scholar
  54. Razeto-Barry P, Diaz J, Cotoras D, Vasquez RA (2011) Molecular evolution, mutation size and gene pleiotropy: a geometric reexamination. Genetics 187:877–885PubMedCentralCrossRefPubMedGoogle Scholar
  55. Rocha EP (2006) The quest for the universals of protein evolution. Trends Genet 22:412–416CrossRefPubMedGoogle Scholar
  56. Rocha EP, Danchin A (2004) An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol Biol Evol 21:108–116CrossRefPubMedGoogle Scholar
  57. Rodgers-Melnick E, Mane SP, Dharmawardhana P, Slavov GT, Crasta OR, Strauss SH, Brunner AM, Difazio SP (2012) Contrasting patterns of evolution following whole genome versus tandem duplication events in Populus. Genome Res 22:95–105PubMedCentralCrossRefPubMedGoogle Scholar
  58. Saeed R, Deane CM (2006) Protein protein interactions, evolutionary rate, abundance and age. BMC Bioinform 7:128CrossRefGoogle Scholar
  59. Subramanian S, Kumar S (2004) Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics 168:373–381PubMedCentralCrossRefPubMedGoogle Scholar
  60. Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680PubMedCentralCrossRefPubMedGoogle Scholar
  61. Vinogradov AE (2001) Intron length and codon usage. J Mol Evol 52:2–5CrossRefPubMedGoogle Scholar
  62. Vinogradov AE (2010) Systemic factors dominate mammal protein evolution. Proc Biol Sci 277:1403–1408PubMedCentralCrossRefPubMedGoogle Scholar
  63. Wang Z, Zhang J (2009) Why is the correlation between gene importance and gene evolutionary rate so weak? PLoS Genet 5:e1000329PubMedCentralCrossRefPubMedGoogle Scholar
  64. Warringer J, Blomberg A (2006) Evolutionary constraints on yeast protein size. BMC Evol Biol 6:61PubMedCentralCrossRefPubMedGoogle Scholar
  65. Wilson AC, Carlson SS, White TJ (1977) Biochemical evolution. Annu Rev Biochem 46:573–639CrossRefPubMedGoogle Scholar
  66. Wuchty S (2002) Interaction and domain networks of yeast. Proteomics 2:1715–1723CrossRefPubMedGoogle Scholar
  67. Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586–1591CrossRefPubMedGoogle Scholar
  68. Yang J, Su AI, Li WH (2005) Gene expression evolves faster in narrowly than in broadly expressed mammalian genes. Mol Biol Evol 22:2113–2118CrossRefPubMedGoogle Scholar
  69. Yu H, Zhu X, Greenbaum D, Karro J, Gerstein M (2004) TopNet: a tool for comparing biological sub-networks, correlating protein properties with topological statistics. Nucleic Acids Res 32:328–337PubMedCentralCrossRefPubMedGoogle Scholar
  70. Zeng Y, Gu X (2010) Hypothesis Genome factor and gene pleiotropy hypotheses in protein evolution. Biol Direct 5:37PubMedCentralCrossRefPubMedGoogle Scholar
  71. Zhang L, Li WH (2004) Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Mol Biol Evol 21:236–239CrossRefPubMedGoogle Scholar
  72. Zotenko E, Mestre J, O’Leary DP, Przytycka TM (2008) Why do hubs in the yeast protein interaction network tend to be essential: reexamining the connection between the network topology and essentiality. PLoS Comput Biol 4:e1000140PubMedCentralCrossRefPubMedGoogle Scholar
  73. Zuckerkandl E (1976) Evolutionary processes and evolutionary noise at the molecular level. J Mol Evol 7:269–311CrossRefPubMedGoogle Scholar

Copyright information

© The Genetics Society of Korea and Springer-Science and Media 2015

Authors and Affiliations

  1. 1.Department of Medical Biotechnology, College of Biomedical Science, and Institute of Bioscience & BiotechnologyKangwon National UniversityChuncheonSouth Korea

Personalised recommendations