In order to test the hypothesis that the nucleotide sequences of the primitive informational polymers might not be chosen randomly and in the attempt to compare among taxa, we propose a comparison of computer-generated random sequences with tRNAs nucleotide sequences present in the bacterial and archaeal genomes, being tRNAs molecules possible “fossils” of the time (billions years ago) in which life arose. Our approach is based on the analysis of sequences of tRNAs described as random walks and the distances from the origin evaluated by the use of nonlinear indexes (largest Lyapunov exponent, entropy, BDS statistic). Six different tRNAs of Bacteria and Archaea (ten Archaea and ten Bacteria, thermophilic and mesophilic ones; n = 120), and computer-generated random sequences (n = 50) were studied. Our data show that tRNAs present indices statistical lower than the ones of computer-generated random data (tRNAs own a more ordered sequence than random ones: Lyapunov, p < 0.01; entropy, p < 0.05; BDS, p < 0.01). The observed deviation from pure randomness should be arisen from some constraints like the secondary structure of this biologic macromolecule and/or from a “frozen” stochastic transition, or even from the possible peculiar origin of tRNA by replication of older proto-RNA. Comparing between taxa, in the species studied, Bacteria present BDS and Base ratio (G+C)/(A+T) indexes statistically lower than in Archaea, together which a 20 % of entropy increase. The analysis of a greater number of tRNAs and species will permit to explain if this finding, showing a higher randomness in the bacterial tRNAs sequences, is linked to the different base ratio, to the different environments in which the microorganisms live or to an evolutionary effect.
Nonlinear analysis Genomic sequences Random walks tRNA Molecular evolution Early evolution of life
This is a preview of subscription content, log in to check access.
Arneodo A, Bacry E, Graves PV et al (1995) Characterizing long-range correlations in DNA sequences from wavelet analysis. Phys Rev Lett 74:3293–3296PubMedCrossRefGoogle Scholar
Berger JA, Mitra SK, Carli M et al (2002) New approaches to genome sequence analysis based on digital signal processing. IEEE Workshop on GENSIPS:1–4Google Scholar
Berger JA, Mitra SK, Carli M et al (2004) Visualization and analysis of DNA sequences using DNA walks. J Frankl Inst 341:37–53CrossRefGoogle Scholar
Brock WA (1986) Distinguishing random and deterministic systems: abridged version. J Econ Theory 40:168–195CrossRefGoogle Scholar
Ciccarelli FD, Doerks T, von Mering C et al (2006) Toward automatic reconstruction of a highly resolved tree of life. Science 311:1283–1287PubMedCrossRefGoogle Scholar
Claverie J-M (1997) Computational methods for the identification of genes in vertebrate genomic sequences. Hum Mol Genet 6:1735–1744PubMedCrossRefGoogle Scholar
Eigen M, Lindemann BF, Tietze M et al (1989) How old is the genetic code? Statistical geometry of tRNA provides an answer. Science 244:673–679PubMedCrossRefGoogle Scholar
Fasold M, Langenberger D, Binder H et al (2011) DARIO: a ncRNA detection and analysis tool for next-generation sequencing experiments. Nucleic Acids Res 39:W112–W117PubMedCentralPubMedCrossRefGoogle Scholar
Feller W (1968) An introduction to probability theory and its applications, 3rd edn., Wiley series in probability and mathematical statisticsWiley, Wiley Google Scholar