Abstract
The lack of consensus concerning the biological meaning of entropy and complexity of genomes and the different ways to assess these data hamper conclusions concerning what are the causes of genomic entropy variation among species. This study aims to evaluate the entropy and complexity of genomic sequences of several species without using homologies to assess relationships among these variables and non-molecular data (e.g., the number of individuals) to seek a trigger of interspecific genomic entropy variation. The results indicate a relationship among genomic entropy, genome size, genomic complexity, and the number of individuals: species with a small number of individuals harbors large genome, and hence, low entropy but a higher complexity. We defined that the complexity of a genome relies on the entropy of each DNA segment within genome. Then, the entropy and complexity of a genome reflects its organization solely. Exons of vertebrates harbor smaller entropies than non-exon regions (likely by the repeats that accumulated from duplications), whereas other taxonomic groups do not present this pattern. Our findings suggest that small initial population might have defined current genomic entropy and complexity: actual genomes are less complex than ancestral ones. Besides, our data disagree with the relationship between phenotype and genomic entropies previously established. Finally, by establishing the relationship between genomic entropy/complexity with the number of individuals and genome size, under an evolutive perspective, ideas concerning the genomic variability may emerge.
Similar content being viewed by others
References
Adami C (2002) What is complexity? BioEssays 24:1085–1094. https://doi.org/10.1002/bies.10192
Adami C (2004) Information theory in molecular biology. Phys Life Rev 1:3–22. https://doi.org/10.1016/j.plrev.2004.01.002
Almirantis Y, Arndt P, Li W, Provata A (2014) Editorial: complexity in genomes. Comput Biol Chem 53:1–4. https://doi.org/10.1016/j.compbiolchem.2014.08.003
Bar-On YM, Phillips R, Milo R (2018) The biomass distribution on Earth. Proc Natl Acad Sci 115:6506–6511. https://doi.org/10.1073/pnas.1711842115
Batista MVA, Ferreira TAE, Freitas AC, Balbino VQ (2011) An entropy-based approach for the identification of phylogenetically informative genomic regions of Papillomavirus. Infect Genet Evol 11:2026–2033. https://doi.org/10.1016/j.meegid.2011.09.013
Bobay L-M, Ochman H (2018) Factors driving effective population size and pan-genome evolution in bacteria. BMC Evol Biol 18:153. https://doi.org/10.1186/s12862-018-1272-4
Bolshoy A (2008) Revisiting the relationship between compositional sequence complexity and periodicity. Comput Biol Chem 32:17–28. https://doi.org/10.1016/j.compbiolchem.2007.09.001
Bonnici V, Manca V (2016) Informational laws of genome structures. Sci Rep 6:28840. https://doi.org/10.1038/srep28840
Damaševičius R (2010) Complexity estimation of genetic sequences using information-theoretic and frequency analysis methods. Informatica 21:13–30. https://doi.org/10.15388/informatica.2010.270
Davis EH, Beck AS, Strother AE et al (2019) Attenuation of live-attenuated yellow fever 17D vaccine virus is localized to a high-fidelity replication complex. MBio. https://doi.org/10.1128/mBio.02294-19
de Vladar HP, Barton NH (2011) The contribution of statistical physics to evolutionary biology. Trends Ecol Evol 26:424–432. https://doi.org/10.1016/j.tree.2011.04.002
Demuth JP, Hahn MW (2009) The life and death of gene families. BioEssays 31:29–39. https://doi.org/10.1002/bies.080085
Ebeling W, Nicolis G (1991) Entropy of symbolic sequences: the role of correlations. Europhys Lett 14:191–196. https://doi.org/10.1209/0295-5075/14/3/001
García JA, José MV (2005) Mathematical properties of DNA sequences from coding and noncoding regions. Rev Mex Fis 51:122–130
Garcia-Boronat M, Diez-Rivero CM, Reinherz EL, Reche PA (2008) PVS: a web server for protein sequence variability analysis tuned to facilitate conserved epitope discovery. Nucleic Acids Res 36:W35–W41. https://doi.org/10.1093/nar/gkn211
Gregory TR (2005) The C-value enigma in plants and animals: a review of parallels and an appeal for partnership. Ann Bot 95:133–146. https://doi.org/10.1093/aob/mci009
Heim NA, Payne JL, Finnegan S et al (2017) Hierarchical complexity and the size limits of life. Proc R Soc B Biol Sci 284:20171039. https://doi.org/10.1098/rspb.2017.1039
Jain K, Krug J, Park S-C (2011) Evolutionary advantage of small populations on complex fitness landscapes. Evolution (N Y) 65:1945–1955. https://doi.org/10.1111/j.1558-5646.2011.01280.x
Jiang Y, Xu C (2010) The calculation of information and organismal complexity. Biol Direct 5:59. https://doi.org/10.1186/1745-6150-5-59
Khatri BS, Goldstein RA (2019) Biophysics and population size constrains speciation in an evolutionary model of developmental system drift. PLOS Comput Biol 15:e1007177. https://doi.org/10.1371/journal.pcbi.1007177
Kolmogorov AN (1998) On tables of random numbers. Theor Comput Sci 207:387–395. https://doi.org/10.1016/S0304-3975(98)00075-9
Koonin EV (2004) A non-adaptationist perspective on evolution of genomic complexity or the continued dethroning of man. Cell Cycle 3:280–285
Koonin EV (2016) The meaning of biological information. Philos Trans R Soc A Math Phys Eng Sci 374:20150065. https://doi.org/10.1098/rsta.2015.0065
Koslicki D (2011) Topological entropy of DNA sequences. Bioinformatics 27:1061–1067. https://doi.org/10.1093/bioinformatics/btr077
LaBar T, Adami C (2016) Different evolutionary paths to complexity for small and large populations of digital organisms. PLOS Comput Biol 12:e1005066. https://doi.org/10.1371/journal.pcbi.1005066
Li B, Xia Q, Lu C et al (2004) Analysis on frequency and density of microsatellites in coding sequences of several eukaryotic genomes. Genom Proteomics Bioinform 2:24–31. https://doi.org/10.1016/S1672-0229(04)02004-2
Liedtke HC, Gower DJ, Wilkinson M, Gomez-Mestre I (2018) Macroevolutionary shift in the size of amphibian genomes and the role of life history and climate. Nat Ecol Evol 2:1792–1799. https://doi.org/10.1038/s41559-018-0674-4
Liu Z, Venkatesh SS, Maley CC (2008) Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples. BMC Genom 9:509. https://doi.org/10.1186/1471-2164-9-509
López-Flores I, Garrido-Ramos MA (2012) The repetitive DNA content of eukaryotic genomes. Genome Dyn 7:1–28. https://doi.org/10.1159/000337118
Lu ZH, Archibald AL, Ait-Ali T (2014) Beyond the whole genome consensus: unravelling of PRRSV phylogenomics using next generation sequencing technologies. Virus Res 194:167–174. https://doi.org/10.1016/j.virusres.2014.10.004
Lynch M (2006) The origins of eukaryotic gene structure. Mol Biol Evol 23:450–468. https://doi.org/10.1093/molbev/msj050
Lynch M, Conery JS (2003) The origins of genome complexity. Science (80-) 302:1401–1404. https://doi.org/10.1126/science.1089370
Macropol K, Can T, Singh AK (2009) RRW: repeated random walks on genome-scale protein networks for local cluster discovery. BMC Bioinform 10:283. https://doi.org/10.1186/1471-2105-10-283
Mattiussi C, Waibel M, Floreano D (2004) Measures of diversity for populations and distances between individuals with highly reorganizable genomes. Evol Comput 12:495–515. https://doi.org/10.1162/1063656043138923
Melnik SS, Usatenko OV (2016) Entropy and long-range memory in random symbolic additive Markov chains. Phys Rev E 93:062144. https://doi.org/10.1103/PhysRevE.93.062144
Miyazaki S, Sugawara H, Ohya M (1996) The efficiency of entropy evolution rate for construction of phylogenetic trees. Genes Genet Syst 71:323–327. https://doi.org/10.1266/ggs.71.323
Oliver JL, Bernaola-Galván P, Hackenberg M, Carpena P (2008) Phylogenetic distribution of large-scale genome patchiness. BMC Evolut Biol 8:107. https://doi.org/10.1186/1471-2148-8-107
Orlov YL, Potapov VN (2004) Complexity: an internet resource for analysis of DNA sequence complexity. Nucleic Acids Res 32:W628–W633. https://doi.org/10.1093/nar/gkh466
Pritišanac I, Vernon R, Moses A, Forman Kay J (2019) Entropy and information within intrinsically disordered protein regions. Entropy 21:662. https://doi.org/10.3390/e21070662
Provata A, Nicolis C, Nicolis G (2014) Complexity measures for the evolutionary categorization of organisms. Comput Biol Chem 53:5–14. https://doi.org/10.1016/j.compbiolchem.2014.08.004
Ray TS (1994) Evolution, complexity, entropy and artificial reality. Phys D Nonlinear Phenom 75:239–263. https://doi.org/10.1016/0167-2789(94)90286-0
Romiguier J, Gayral P, Ballenghien M et al (2014) Comparative population genomics in animals uncovers the determinants of genetic diversity. Nature 515:261–263. https://doi.org/10.1038/nature13685
Rozen DE, Habets MGJL, Handel A, de Visser JAGM (2008) Heterogeneous adaptive trajectories of small populations on complex fitness landscapes. PLoS ONE 3:e1715. https://doi.org/10.1371/journal.pone.0001715
Schneider TD, Stephens RM (1990) Sequence logos: a new way to display consensus sequences. Nucleic Acids Res 18:6097–6100. https://doi.org/10.1093/nar/18.20.6097
Schürmann T, Grassberger P (1996) Entropy estimation of symbol sequences. Chaos Interdiscip J Nonlinear Sci 6:414–427. https://doi.org/10.1063/1.166191
Sherwin WB, Chao A, Jost L, Smouse PE (2017) Information theory broadens the spectrum of molecular ecology and evolution. Trends Ecol Evol 32:948–963. https://doi.org/10.1016/j.tree.2017.09.012
Silveira S, Cibulski SP, Junqueira DM et al (2020) Phylogenetic and evolutionary analysis of HoBi-like pestivirus: insights into origin and dispersal. Transbound Emerg Dis. https://doi.org/10.1111/tbed.13520
Tenreiro Machado JA (2012) Shannon entropy analysis of the genome code. Math Probl Eng 2012:1–12. https://doi.org/10.1155/2012/132625
Thanos D, Li W, Provata A (2018) Entropic fluctuations in DNA sequences. Phys A Stat Mech Appl 493:444–457. https://doi.org/10.1016/j.physa.2017.11.119
Thybert D, Roller M, Navarro FCP et al (2018) Repeat associated mechanisms of genome evolution and function revealed by the Mus caroli and Mus pahari genomes. Genome Res 28:448–459. https://doi.org/10.1101/gr.234096.117
Vandepoele K, De Vos W, Taylor JS et al (2004) Major events in the genome evolution of vertebrates: paranome age and size differ considerably between ray-finned fishes and land vertebrates. Proc Natl Acad Sci 101:1638–1643. https://doi.org/10.1073/pnas.0307968100
Walsh B (2003) Population-genetic models of the fates of duplicate genes. Genetica 118:279–294
Waters NR, Abram F, Brennan F et al (2018) riboSeed: leveraging prokaryotic genomic architecture to assemble across ribosomal regions. Nucleic Acids Res 46:e68–e68. https://doi.org/10.1093/nar/gky212
Willi Y, Griffin P, Van Buskirk J (2013) Drift load in populations of small size and low density. Heredity (Edinb) 110:296–302. https://doi.org/10.1038/hdy.2012.86
Witten IH, Frank E, Hall M, Pal CJ (2016) Data mining, fourth edition: practical machine learning tools and techniques, 4th edn. Morgan Kaufmann Publishers Inc., San Francisco
Wolf YI, Koonin EV (2013) Genome reduction as the dominant mode of evolution. BioEssays 35:829–837. https://doi.org/10.1002/bies.201300037
Wu Z, Fang D, Yang R et al (2018) De novo genome assembly of Oryza granulata reveals rapid genome expansion and adaptive evolution. Commun Biol 1:84. https://doi.org/10.1038/s42003-018-0089-4
Zhang Q-J, Gao L-Z (2017) Rapid and recent evolution of LTR retrotransposons drives rice genome evolution during the speciation of AA-genome Oryza species. G3 Genes Genomes Genet 7:1875–1885. https://doi.org/10.1534/g3.116.037572
Acknowledgements
We thank Dr. Michelle Collins (Max Planck Institute) for English review and comments, and Dr. Rogério Fernandes de Souza (Londrina State University) for comments concerning the evolutionary issues. This research was supported by computational resources supplied by the Center for Scientific Computing (NCC/GridUNESP) of the São Paulo State University (UNESP).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Rafael Plana Simões declares no conflict of interest. Ivan Rodrigo Wolf declares no conflict of interest. Bruno Afonso Correa declares no conflict of interest. Guilherme Targino Valente no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Simões, R.P., Wolf, I.R., Correa, B.A. et al. Uncovering patterns of the evolution of genomic sequence entropy and complexity. Mol Genet Genomics 296, 289–298 (2021). https://doi.org/10.1007/s00438-020-01729-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00438-020-01729-y