Journal of Molecular Evolution

, Volume 87, Issue 2–3, pp 93–105 | Cite as

Conserved Critical Evolutionary Gene Structures in Orthologs

  • Miguel A. FuertesEmail author
  • José R. Rodrigo
  • Carlos Alonso
Original Article


Unravelling gene structure requires the identification and understanding of the constraints that are often associated with the evolutionary history and functional domains of genes. We speculated in this manuscript with the possibility of the existence in orthologs of an emergent highly conserved gene structure that might explain their coordinated evolution during speciation events and their parental function. Here, we will address the following issues: (1) is there any conserved hypothetical structure along ortholog gene sequences? (2) If any, are such conserved structures maintained and conserved during speciation events? The data presented show evidences supporting this hypothesis. We have found that, (1) most orthologs studied share highly conserved compositional structures not observed previously. (2) While the percent identity of nucleotide sequences of orthologs correlates with the percent identity of composon sequences, the number of emergent compositional structures conserved during speciation does not correlate with the percent identity. (3) A broad range of species conserves the emergent compositional stretches. We will also discuss the concept of critical gene structure.


Molecular evolution Triplet-composon Gene structure Human–mouse orthologs 



This work was funded by a program of the Instituto de Salud Carlos III-Redes Temáticas de Investigación Cooperativa en Salud (ISCIII-RETIC RD06/0021/0008 program) and Laboratorios LETI. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. An institutional grant from Fundación Ramón Areces is also acknowledged.

Supplementary material

239_2019_9889_MOESM1_ESM.docx (65 kb)
Online Resource 1. Sample of mouse genes contained into specific tCP-clusters different from those of their human orthologs (DOCX 65 KB)
239_2019_9889_MOESM2_ESM.docx (39 kb)
Online Resource 2. Dataset of human-mouse orthologs that change during speciation from a compositional cluster in mouse to another different in human (sample 1) showing both NT and tCP alignment data and the number of tCPs conserved per ortholog (DOCX 38 KB)
239_2019_9889_MOESM3_ESM.docx (37 kb)
Online Resource 3. Dataset of human-mouse orthologs that do not change during speciation from a compositional cluster in mouse to another different in human showing both NT and tCP alignment data (sample 2) and the number of tCPs conserved per ortholog (DOCX 36 KB)
239_2019_9889_MOESM4_ESM.docx (32 kb)
Online Resource 4. Multiple alignment of 12 orthologs of the human sterile alpha motif domain-containing protein 12 (SAMD12). NTs associated with the conserved tCPs are shaded in blue. The * symbol indicates NTs conserved in all species. The interspersed structure is composed of 42 stretches distributed along the gene length. (DOCX 31 KB)
239_2019_9889_MOESM5_ESM.docx (345 kb)
Online Resource 5. Panel comparing the 14 tCP-profiles of the human-mouse ortholog SAMD12. Red and blue lines correspond to tCP-distributions along the trend line of the cumulative tCP-usage profile of the mouse and the human, respectively. The inset in upper right corner display the name of the ortholog and the mouse and human tCP-clusters containing the ortholog. The inset in the bottom right corner represent a table with the correlations (r) found between human-mouse tCP-profiles for numerical comparison. In bold, the r values higher than the cut-off. (DOCX 344 KB)


  1. Aldrich J (1995) Correlations genuine and spurious in pearson and yule. Stat Sci 10:364–376CrossRefGoogle Scholar
  2. Amit M et al (2012) Differential GC content between exons and introns establishes distinct strategies of splice-site recognition. Cell Rep 1:543–556. CrossRefGoogle Scholar
  3. Arnold J, Cuticchia AJ, Newsome DA, Jennings WW, Ivarie R (1988) Mono-through hexanucleotide composition of the sense strand of yeast DNA: a Markov chain analysis. Nucleic Acids Res 16:7145–7158CrossRefGoogle Scholar
  4. Bhangale TR, Rieder MJ, Livingston RJ, Nickerson DA (2005) Comprehensive identification and characterization of diallelic insertion-deletion polymorphisms in 330 human candidate genes. Hum Mol Genet 14:59–69. CrossRefGoogle Scholar
  5. Blanchette M et al (2004) Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res 14:708–715. CrossRefGoogle Scholar
  6. Bray N, Pachter L (2004) MAVID: constrained ancestral alignment of multiple sequences. Genome Res 14:693–699. CrossRefGoogle Scholar
  7. Comeron JM (2001) What controls the length of noncoding DNA? Curr Opin Genet Dev 11:652–659CrossRefGoogle Scholar
  8. Costas J, Pereira PS, Vieira CP, Pinho S, Vieira J, Casares F (2004) Dynamics and function of intron sequences of the wingless gene during the evolution of the Drosophila genus. Evol Dev 6:325–335. CrossRefGoogle Scholar
  9. Dai Q, Liu XQ, Wang TM, Vukicevic D (2007) Linear regression model of DNA sequences and its application. J Comput Chem 28:1434–1445. CrossRefGoogle Scholar
  10. Frazer KA et al (2001) Evolutionarily conserved sequences on human chromosome 21. Genome Res 11:1651–1659. CrossRefGoogle Scholar
  11. Fuertes MA, Perez JM, Zuckerkandl E, Alonso C (2011) Introns form compositional clusters in parallel with the compositional clusters of the coding sequences to which they pertain. J Mol Evol 72:1–13. CrossRefGoogle Scholar
  12. Fuertes MA, Rodrigo JR, Alonso C (2016a) Do intron and coding sequences of some human–mouse orthologs evolve as a single unit? J Mol Evol 82:247–250. CrossRefGoogle Scholar
  13. Fuertes MA, Rodrigo JR, Alonso C (2016b) A method for the annotation of functional similarities of coding DNA sequences: the case of a populated cluster of transmembrane proteins. J Mol Evol 84:29–38. CrossRefGoogle Scholar
  14. Fuertes MA, Rodrigo JR, Zuckerkandl E, Alonso C (2016c) The chromosomal and functional clustering of markedly divergent human–mouse orthologs run parallel to their compositional features. J DNA RNA Res 1:1–31CrossRefGoogle Scholar
  15. Gates MA (1986) A simple way to look at. DNA J Theor Biol 119:319–328CrossRefGoogle Scholar
  16. Gazave E, Marques-Bonet T, Fernando O, Charlesworth B, Navarro A (2007) Patterns and rates of intron divergence between humans and chimpanzees. Genome Biol 8:R21. CrossRefGoogle Scholar
  17. Gelfman S et al (2012) Changes in exon-intron structure during vertebrate evolution affect the splicing pattern of exons. Genome Res 22:35–50. CrossRefGoogle Scholar
  18. Gilbert W (1978) Why genes in pieces? Nature 271:501CrossRefGoogle Scholar
  19. Gingeras TR (2009) Implications of chimaeric non-co-linear transcripts. Nature 461:206–211. CrossRefGoogle Scholar
  20. Hardison RC, Oeltjen J, Miller W (1997) Long human–mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. Genome Res 7:959–966CrossRefGoogle Scholar
  21. Hong CC, Tang BK, Hammond GL, Tritchler D, Yaffe M, Boyd NF (2004) Cytochrome P450 1A2 (CYP1A2) activity and risk factors for breast cancer: a cross-sectional study. Breast Cancer Res 6:R352–R365. CrossRefGoogle Scholar
  22. Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES (2003) Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423:241–254. CrossRefGoogle Scholar
  23. Keren H, Lev-Maor G, Ast G (2010) Alternative splicing and evolution: diversification, exon definition and function. Nat Rev Genet 11:345–355. CrossRefGoogle Scholar
  24. King MC, Wilson AC (1975) Evolution at two levels in humans and chimpanzees. Science 188:107–116CrossRefGoogle Scholar
  25. Kruskal JB (1983) An overview of squence comparison. Time warps, string edits and macromolecules: the theory and practice of sequence comparison, Addison Wesley edn. CSLI Publications, Stanford UniversityGoogle Scholar
  26. Kumar S, Stecher G, Tamura K (2016) MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33:1870–1874. CrossRefGoogle Scholar
  27. Leong PM, Morgenthaler S (1995) Random walk and gap plots of DNA sequences. Comput Appl Biosci 11:503–507Google Scholar
  28. Louie E, Ott J, Majewski J (2003) Nucleotide frequency variation across human genes. Genome Res 13:2594–2601. CrossRefGoogle Scholar
  29. Lunter G (2007) Probabilistic whole-genome alignments reveal high indel rates in the human and mouse genomes. Bioinformatics 23:i289–296 CrossRefGoogle Scholar
  30. Majewski J, Ott J (2002) Distribution and characterization of regulatory elements in the human genome. Genome Res 12:1827–1836. CrossRefGoogle Scholar
  31. Mattick JS, Gagen MJ (2001) The evolution of controlled multitasked gene networks: the role of introns and other noncoding RNAs in the development of complex organisms. Mol Biol Evol 18:1611–1630CrossRefGoogle Scholar
  32. Mills RE et al (2011) Natural genetic variation caused by small insertions and deletions in the human genome. Genome Res 21:830–839. CrossRefGoogle Scholar
  33. Morrison DA (2009) Why would phylogeneticists ignore computerized sequence alignment? Syst Biol 58:150–158. CrossRefGoogle Scholar
  34. Morrison DA (2015) Is sequence alignment an art or a science? Syst Bot 40:14–26. CrossRefGoogle Scholar
  35. Mullan LJ, Bleasby AJ (2002) Short EMBOSS User Guide. Eur Mol Biol Open Softw Suite Brief Bioinform 3:92–94Google Scholar
  36. Nandy A (2009) Empirical relationship between intra-purine and intra-pyrimidine differences in conserved gene sequences. PLoS ONE 4:e6829. CrossRefGoogle Scholar
  37. Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453CrossRefGoogle Scholar
  38. Notebaart RA, Huynen MA, Teusink B, Siezen RJ, Snel B (2005) Correlation between sequence conservation and the genomic context after gene duplication. Nucleic Acids Res 33:6164–6171. CrossRefGoogle Scholar
  39. Olson SA (2002) EMBOSS opens up sequence analysis. Eur Mol Biol Open Softw Suite Brief Bioinform 3:87–91Google Scholar
  40. Parker SC, Tullius TD (2011) DNA shape, genetic codes, and evolution. Curr Opin Struct Biol 21:342–347. CrossRefGoogle Scholar
  41. Parmley JL, Urrutia AO, Potrzebowski L, Kaessmann H, Hurst LD (2007) Splicing and the evolution of proteins in mammals. PLoS Biol 5:e14. CrossRefGoogle Scholar
  42. Pearson H (2006) Genetic information: codes and enigmas. Nature 444:259–261. CrossRefGoogle Scholar
  43. Pearson WR (2013) An introduction to sequence similarity (“homology”) searching. Curr Protoc Bioinform. Google Scholar
  44. Robart AR, Zimmerly S (2005) Group II intron retroelements: function and diversity. Cytogenet Genome Res 110:589–597. CrossRefGoogle Scholar
  45. Robart AR, Seo W, Zimmerly S (2007) Insertion of group II intron retroelements after intrinsic transcriptional terminators. Proc Natl Acad Sci USA 104:6620–6625. CrossRefGoogle Scholar
  46. Rogozin IB et al (2002) Connected gene neighborhoods in prokaryotic genomes. Nucleic Acids Res 30:2212–2223CrossRefGoogle Scholar
  47. Rogozin IB, Sverdlov AV, Babenko VN, Koonin EV (2005) Analysis of evolution of exon-intron structure of eukaryotic genes. Briefings Bioinf 6:118–134CrossRefGoogle Scholar
  48. Roy A, Raychaudhury C, Nandy A (1988) Novel techniques of graphical representation and analysis of DNA sequences—a review. J Biosci 23:55–71CrossRefGoogle Scholar
  49. Schwartz S, Meshorer E, Ast G (2009) Chromatin organization marks exon-intron structure. Nat Struct Mol Biol 16:990–995. CrossRefGoogle Scholar
  50. Sneath PHA, Sokal RR (1973) Numerical taxonomy. The principles and practice of numerical calssification. A series of books in biology. W. H. Freeman and Company, San FranciscoGoogle Scholar
  51. Sueoka N (1962) On the genetic basis of variation and heterogeneity of DNA base composition. Proc Natl Acad Sci USA 48:582–592CrossRefGoogle Scholar
  52. Takeda M (2012) How is the biological information arranged in genome? Am J Mol Biol 2:171–186CrossRefGoogle Scholar
  53. Tamura K, Nei M, Kumar S (2004) Prospects for inferring very large phylogenies by using the neighbor-joining method. Proc Natl Acad Sci USA 101:11030–11035. CrossRefGoogle Scholar
  54. Trifonov EN (2011) Thirty years of multiple sequence codes. Genomics Proteom Bioinform 9:1–6. CrossRefGoogle Scholar
  55. Wang C, Typas MA, Butt TM (2005) Phylogenetic and exon-intron structure analysis of fungal subtilisins: support for a mixed model of intron evolution. J Mol Evol 60:238–246. CrossRefGoogle Scholar
  56. Weber JL, David D, Heil J, Fan Y, Zhao C, Marth G (2002) Human diallelic insertion/deletion polymorphisms. Am J Hum Genet 71:854–862. CrossRefGoogle Scholar
  57. Woolfe A et al (2005) Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol 3:e7. CrossRefGoogle Scholar
  58. Yates A et al (2016) Ensembl 2016. Nucleic Acids Res 44:D710-716 CrossRefGoogle Scholar
  59. Yue F et al (2014) A comparative encyclopedia of DNA elements in the mouse genome. Nature 515:355–364. CrossRefGoogle Scholar
  60. Zhao Q et al (2009) Transcriptome-guided characterization of genomic rearrangements in a breast cancer cell line. Proc Natl Acad Sci USA 106:1886–1891. CrossRefGoogle Scholar
  61. Zhu L, Zhang Y, Zhang W, Yang S, Chen JQ, Tian D (2009) Patterns of exon-intron architecture variation of genes in eukaryotic genomes. BMC Genom 10:47. CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Centro de Biología Molecular “Severo Ochoa” (CSIC-UAM)Universidad Autónoma de MadridMadridSpain
  2. 2.Telefónica de España S.A.MadridSpain

Personalised recommendations