Abstract
Unravelling gene structure requires the identification and understanding of the constraints that are often associated with the evolutionary history and functional domains of genes. We speculated in this manuscript with the possibility of the existence in orthologs of an emergent highly conserved gene structure that might explain their coordinated evolution during speciation events and their parental function. Here, we will address the following issues: (1) is there any conserved hypothetical structure along ortholog gene sequences? (2) If any, are such conserved structures maintained and conserved during speciation events? The data presented show evidences supporting this hypothesis. We have found that, (1) most orthologs studied share highly conserved compositional structures not observed previously. (2) While the percent identity of nucleotide sequences of orthologs correlates with the percent identity of composon sequences, the number of emergent compositional structures conserved during speciation does not correlate with the percent identity. (3) A broad range of species conserves the emergent compositional stretches. We will also discuss the concept of critical gene structure.
Similar content being viewed by others
References
Aldrich J (1995) Correlations genuine and spurious in pearson and yule. Stat Sci 10:364–376
Amit M et al (2012) Differential GC content between exons and introns establishes distinct strategies of splice-site recognition. Cell Rep 1:543–556. https://doi.org/10.1016/j.celrep.2012.03.013
Arnold J, Cuticchia AJ, Newsome DA, Jennings WW, Ivarie R (1988) Mono-through hexanucleotide composition of the sense strand of yeast DNA: a Markov chain analysis. Nucleic Acids Res 16:7145–7158
Bhangale TR, Rieder MJ, Livingston RJ, Nickerson DA (2005) Comprehensive identification and characterization of diallelic insertion-deletion polymorphisms in 330 human candidate genes. Hum Mol Genet 14:59–69. https://doi.org/10.1093/hmg/ddi006
Blanchette M et al (2004) Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res 14:708–715. https://doi.org/10.1101/gr.1933104
Bray N, Pachter L (2004) MAVID: constrained ancestral alignment of multiple sequences. Genome Res 14:693–699. https://doi.org/10.1101/gr.1960404
Comeron JM (2001) What controls the length of noncoding DNA? Curr Opin Genet Dev 11:652–659
Costas J, Pereira PS, Vieira CP, Pinho S, Vieira J, Casares F (2004) Dynamics and function of intron sequences of the wingless gene during the evolution of the Drosophila genus. Evol Dev 6:325–335. https://doi.org/10.1111/j.1525-142X.2004.04040.x
Dai Q, Liu XQ, Wang TM, Vukicevic D (2007) Linear regression model of DNA sequences and its application. J Comput Chem 28:1434–1445. https://doi.org/10.1002/jcc.20556
Frazer KA et al (2001) Evolutionarily conserved sequences on human chromosome 21. Genome Res 11:1651–1659. https://doi.org/10.1101/gr.198201
Fuertes MA, Perez JM, Zuckerkandl E, Alonso C (2011) Introns form compositional clusters in parallel with the compositional clusters of the coding sequences to which they pertain. J Mol Evol 72:1–13. https://doi.org/10.1007/s00239-010-9411-6
Fuertes MA, Rodrigo JR, Alonso C (2016a) Do intron and coding sequences of some human–mouse orthologs evolve as a single unit? J Mol Evol 82:247–250. https://doi.org/10.1007/s00239-016-9746-8
Fuertes MA, Rodrigo JR, Alonso C (2016b) A method for the annotation of functional similarities of coding DNA sequences: the case of a populated cluster of transmembrane proteins. J Mol Evol 84:29–38. https://doi.org/10.1007/s00239-016-9763-7
Fuertes MA, Rodrigo JR, Zuckerkandl E, Alonso C (2016c) The chromosomal and functional clustering of markedly divergent human–mouse orthologs run parallel to their compositional features. J DNA RNA Res 1:1–31
Gates MA (1986) A simple way to look at. DNA J Theor Biol 119:319–328
Gazave E, Marques-Bonet T, Fernando O, Charlesworth B, Navarro A (2007) Patterns and rates of intron divergence between humans and chimpanzees. Genome Biol 8:R21. https://doi.org/10.1186/gb-2007-8-2-r21
Gelfman S et al (2012) Changes in exon-intron structure during vertebrate evolution affect the splicing pattern of exons. Genome Res 22:35–50. https://doi.org/10.1101/gr.119834.110
Gilbert W (1978) Why genes in pieces? Nature 271:501
Gingeras TR (2009) Implications of chimaeric non-co-linear transcripts. Nature 461:206–211. https://doi.org/10.1038/nature08452
Hardison RC, Oeltjen J, Miller W (1997) Long human–mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. Genome Res 7:959–966
Hong CC, Tang BK, Hammond GL, Tritchler D, Yaffe M, Boyd NF (2004) Cytochrome P450 1A2 (CYP1A2) activity and risk factors for breast cancer: a cross-sectional study. Breast Cancer Res 6:R352–R365. https://doi.org/10.1186/bcr798
Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES (2003) Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423:241–254. https://doi.org/10.1038/nature01644
Keren H, Lev-Maor G, Ast G (2010) Alternative splicing and evolution: diversification, exon definition and function. Nat Rev Genet 11:345–355. https://doi.org/10.1038/nrg2776
King MC, Wilson AC (1975) Evolution at two levels in humans and chimpanzees. Science 188:107–116
Kruskal JB (1983) An overview of squence comparison. Time warps, string edits and macromolecules: the theory and practice of sequence comparison, Addison Wesley edn. CSLI Publications, Stanford University
Kumar S, Stecher G, Tamura K (2016) MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33:1870–1874. https://doi.org/10.1093/molbev/msw054
Leong PM, Morgenthaler S (1995) Random walk and gap plots of DNA sequences. Comput Appl Biosci 11:503–507
Louie E, Ott J, Majewski J (2003) Nucleotide frequency variation across human genes. Genome Res 13:2594–2601. https://doi.org/10.1101/gr.1317703
Lunter G (2007) Probabilistic whole-genome alignments reveal high indel rates in the human and mouse genomes. Bioinformatics 23:i289–296 https://doi.org/10.1093/bioinformatics/btm185
Majewski J, Ott J (2002) Distribution and characterization of regulatory elements in the human genome. Genome Res 12:1827–1836. https://doi.org/10.1101/gr.606402
Mattick JS, Gagen MJ (2001) The evolution of controlled multitasked gene networks: the role of introns and other noncoding RNAs in the development of complex organisms. Mol Biol Evol 18:1611–1630
Mills RE et al (2011) Natural genetic variation caused by small insertions and deletions in the human genome. Genome Res 21:830–839. https://doi.org/10.1101/gr.115907.110
Morrison DA (2009) Why would phylogeneticists ignore computerized sequence alignment? Syst Biol 58:150–158. https://doi.org/10.1093/sysbio/syp009
Morrison DA (2015) Is sequence alignment an art or a science? Syst Bot 40:14–26. https://doi.org/10.1600/036364415X686305
Mullan LJ, Bleasby AJ (2002) Short EMBOSS User Guide. Eur Mol Biol Open Softw Suite Brief Bioinform 3:92–94
Nandy A (2009) Empirical relationship between intra-purine and intra-pyrimidine differences in conserved gene sequences. PLoS ONE 4:e6829. https://doi.org/10.1371/journal.pone.0006829
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453
Notebaart RA, Huynen MA, Teusink B, Siezen RJ, Snel B (2005) Correlation between sequence conservation and the genomic context after gene duplication. Nucleic Acids Res 33:6164–6171. https://doi.org/10.1093/nar/gki913
Olson SA (2002) EMBOSS opens up sequence analysis. Eur Mol Biol Open Softw Suite Brief Bioinform 3:87–91
Parker SC, Tullius TD (2011) DNA shape, genetic codes, and evolution. Curr Opin Struct Biol 21:342–347. https://doi.org/10.1016/j.sbi.2011.03.002
Parmley JL, Urrutia AO, Potrzebowski L, Kaessmann H, Hurst LD (2007) Splicing and the evolution of proteins in mammals. PLoS Biol 5:e14. https://doi.org/10.1371/journal.pbio.0050014
Pearson H (2006) Genetic information: codes and enigmas. Nature 444:259–261. https://doi.org/10.1038/444259a
Pearson WR (2013) An introduction to sequence similarity (“homology”) searching. Curr Protoc Bioinform. https://doi.org/10.1002/0471250953.bi0301s42
Robart AR, Zimmerly S (2005) Group II intron retroelements: function and diversity. Cytogenet Genome Res 110:589–597. https://doi.org/10.1159/000084992
Robart AR, Seo W, Zimmerly S (2007) Insertion of group II intron retroelements after intrinsic transcriptional terminators. Proc Natl Acad Sci USA 104:6620–6625. https://doi.org/10.1073/pnas.0700561104
Rogozin IB et al (2002) Connected gene neighborhoods in prokaryotic genomes. Nucleic Acids Res 30:2212–2223
Rogozin IB, Sverdlov AV, Babenko VN, Koonin EV (2005) Analysis of evolution of exon-intron structure of eukaryotic genes. Briefings Bioinf 6:118–134
Roy A, Raychaudhury C, Nandy A (1988) Novel techniques of graphical representation and analysis of DNA sequences—a review. J Biosci 23:55–71
Schwartz S, Meshorer E, Ast G (2009) Chromatin organization marks exon-intron structure. Nat Struct Mol Biol 16:990–995. https://doi.org/10.1038/nsmb.1659
Sneath PHA, Sokal RR (1973) Numerical taxonomy. The principles and practice of numerical calssification. A series of books in biology. W. H. Freeman and Company, San Francisco
Sueoka N (1962) On the genetic basis of variation and heterogeneity of DNA base composition. Proc Natl Acad Sci USA 48:582–592
Takeda M (2012) How is the biological information arranged in genome? Am J Mol Biol 2:171–186
Tamura K, Nei M, Kumar S (2004) Prospects for inferring very large phylogenies by using the neighbor-joining method. Proc Natl Acad Sci USA 101:11030–11035. https://doi.org/10.1073/pnas.0404206101
Trifonov EN (2011) Thirty years of multiple sequence codes. Genomics Proteom Bioinform 9:1–6. https://doi.org/10.1016/S1672-0229(11)60001-6
Wang C, Typas MA, Butt TM (2005) Phylogenetic and exon-intron structure analysis of fungal subtilisins: support for a mixed model of intron evolution. J Mol Evol 60:238–246. https://doi.org/10.1007/s00239-004-0147-z
Weber JL, David D, Heil J, Fan Y, Zhao C, Marth G (2002) Human diallelic insertion/deletion polymorphisms. Am J Hum Genet 71:854–862. https://doi.org/10.1086/342727
Woolfe A et al (2005) Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol 3:e7. https://doi.org/10.1371/journal.pbio.0030007
Yates A et al (2016) Ensembl 2016. Nucleic Acids Res 44:D710-716 https://doi.org/10.1093/nar/gkv1157
Yue F et al (2014) A comparative encyclopedia of DNA elements in the mouse genome. Nature 515:355–364. https://doi.org/10.1038/nature13992
Zhao Q et al (2009) Transcriptome-guided characterization of genomic rearrangements in a breast cancer cell line. Proc Natl Acad Sci USA 106:1886–1891. https://doi.org/10.1073/pnas.0812945106
Zhu L, Zhang Y, Zhang W, Yang S, Chen JQ, Tian D (2009) Patterns of exon-intron architecture variation of genes in eukaryotic genomes. BMC Genom 10:47. https://doi.org/10.1186/1471-2164-10-47
Funding
This work was funded by a program of the Instituto de Salud Carlos III-Redes Temáticas de Investigación Cooperativa en Salud (ISCIII-RETIC RD06/0021/0008 program) and Laboratorios LETI. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. An institutional grant from Fundación Ramón Areces is also acknowledged.
Author information
Authors and Affiliations
Corresponding author
Additional information
Handling editor: Hideki Innan.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Online Resource 1
. Sample of mouse genes contained into specific tCP-clusters different from those of their human orthologs (DOCX 65 KB)
Online Resource 2
. Dataset of human-mouse orthologs that change during speciation from a compositional cluster in mouse to another different in human (sample 1) showing both NT and tCP alignment data and the number of tCPs conserved per ortholog (DOCX 38 KB)
Online Resource 3
. Dataset of human-mouse orthologs that do not change during speciation from a compositional cluster in mouse to another different in human showing both NT and tCP alignment data (sample 2) and the number of tCPs conserved per ortholog (DOCX 36 KB)
Online Resource 4
. Multiple alignment of 12 orthologs of the human sterile alpha motif domain-containing protein 12 (SAMD12). NTs associated with the conserved tCPs are shaded in blue. The * symbol indicates NTs conserved in all species. The interspersed structure is composed of 42 stretches distributed along the gene length. (DOCX 31 KB)
Online Resource 5
. Panel comparing the 14 tCP-profiles of the human-mouse ortholog SAMD12. Red and blue lines correspond to tCP-distributions along the trend line of the cumulative tCP-usage profile of the mouse and the human, respectively. The inset in upper right corner display the name of the ortholog and the mouse and human tCP-clusters containing the ortholog. The inset in the bottom right corner represent a table with the correlations (r) found between human-mouse tCP-profiles for numerical comparison. In bold, the r values higher than the cut-off. (DOCX 344 KB)
Rights and permissions
About this article
Cite this article
Fuertes, M.A., Rodrigo, J.R. & Alonso, C. Conserved Critical Evolutionary Gene Structures in Orthologs. J Mol Evol 87, 93–105 (2019). https://doi.org/10.1007/s00239-019-09889-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-019-09889-1