Information Theory and Error-Correcting Codes In Genetics and Biological Evolution

  • GÉRard Battail


As semiotics itself, biosemiotics is concerned with semantics. On the other hand, the scientific study of communication engineering led to the development of information theory, which ignores semantics. For this reason, many biologists thought that it would be useless in their disciplines. It turns out however that problems of communication engineering are met in biology and thus can only properly be dealt with using information theory. As an important example, the faithful transmission of genetic information through the ages is a difficult problem which has been overlooked by biologists. Cumulated errors in the DNA molecule due to radiations and even to its own indeterminism as a quantum object actually perturb its communication through time. A simple information-theoretic computation shows that, contrary to the current belief, the genomic memory is ephemeral at the time scale of geology. The conventional template-replication paradigm is thus not tenable. According to a fundamental theorem of information theory, error-correcting codes can perform almost errorless communication provided certain conditions are met. Faithful conservation of genomes can thus be ensured only if they involve error-correcting codes. Then the genomes can be recovered with an arbitrarily small probability of error, provided the interval between successive generations is as short (at the time scale of geology) as to almost always avoid that the number of cumulated errors exceeds the correcting ability of the code

This paper presents an intuitive outline of information theory and error-correcting codes, and briefly reviews the consequences of their application to the problem of genome conservation. It discusses the possible architecture of genomic error-correcting codes, proposing a layered structure referred to as ‘nested codes’ which unequally protects information: the older and more fundamental it is, the better it is protected. As regards the component codes of this system, we notice that the error-correcting ability of codes relies on the existence of constraints which tie together the successive symbols of a sequence. It is convenient in engineering to use mathematical constraints implemented by physical means for performing error correction. Nature is assumed to use to this end ‘soft codes’ with physico-chemical constraints, in addition to linguistic constraints that the genomes need for directing the construction and maintenance of phenotypes. The hypotheses that genomic error-correction means exist and take the form of nested codes then suffice to deduce many features of the living world and of its evolution. Some of these features are recognized biological facts, and others answer debated questions. Most of them have no satisfactory explanation in current biology. The theoretical impossibility of genome conservation without error-correcting means makes these consequences as necessary as the hypotheses themselves. The direct identification of natural error-correcting means is still lacking, but one cannot expect it to be performed without the active involvement of practising geneticists. The paper also briefly questions the epistemological status of the engineering concept of information and its possible relation to semantics. Roughly stated, information appears as a necessary container for semantics, providing a bridge between the concrete and the abstract


Biological evolution error-correcting codes genome conservation genomic channel capacity information theory nested codes soft codes 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Audit B, Vaillant C, Arneodo A, d’Aubenton-Carafa Y, Thermes C (2002) Long-range correlation between DNA bending sites: relation to the structure and dynamics of nucleosomes. J. Mol. Biol. 316: 903–918PubMedCrossRefGoogle Scholar
  2. Barbieri M (2003) The organic codes. Cambridge University Press Cambridge, UKGoogle Scholar
  3. Battail G, Decouvelaere M, Godlewski P (1979) Replication decoding. IEEE Trans. Inf. Th. IT-25(3): 332–345, May 1979CrossRefGoogle Scholar
  4. Battail G (1989) Construction explicite de bons codes longs. Annales Télécommunic. 44(7–8): 392–404, July-August 1989Google Scholar
  5. Battail G, Berrou C, Glavieux A (1993) Pseudo-random recursive convolutional coding for nearcapacity performance. Proc. GLOBECOM’93, Communication Theory Mini-Conference, Houston, USA, Nov. 29-Dec. 2, 4: 23–27Google Scholar
  6. Battail G (1996) On random-like codes. Information Theory and Applications II J.-Y. In: Chouinard P. Fortier Gulliver TA (eds) Lecture Notes in Computer Science No. 1133, pp 76–94, SpringerGoogle Scholar
  7. Battail G (1997) Does information theory explain biological evolution? Europhysics Letters 40(3) Nov. 1st: 343–348Google Scholar
  8. Battail G (2001) Is biological evolution relevant to information theory and coding? Proc. ISCTA’01, Ambleside, UK, 343–351 July 2001Google Scholar
  9. Battail G (2003) Replication decoding revisited. Proc. Information Theory Worshop 03, Paris, France, Mar.–Apr. 2003, 1–5Google Scholar
  10. Battail G (2004) An engineer’s view on genetic information and biological evolution. Biosystems 76: 279–290PubMedCrossRefGoogle Scholar
  11. Battail G (2004) Can we explain the faithful communication of genetic information? DIMACS working group on theoretical advances in information recording: 22–24 March 2004Google Scholar
  12. Battail G (2005) Genetics as a communication process involving error-correcting codes. Journal of Biosemiotics 1(1): 103–144Google Scholar
  13. Battail G (2006) Should genetics get an information-theoretic education? IEEE Engineering in Medicine and Biology Magazine 25(1): 34–45, Jan.-Feb. 2006PubMedCrossRefGoogle Scholar
  14. Berrou C, Glavieux A, Thitimajshima P (1993) Near Shannon limit error-correcting coding and decoding : turbo-codes. Proc. ICC’93, Geneva, Switzerland, May 1993, 1064–1070Google Scholar
  15. Berrou C, Glavieux A (1996) Near optimum error correcting coding and decoding: turbo codes. IEEE Trans. on Communications 44: 1261–1271, Oct. 1996CrossRefGoogle Scholar
  16. Carlach J-C (2005) Comparison of structures and error-detecting/correcting properties between DNA molecules and turbo codes, private communicationGoogle Scholar
  17. Chaitin G (2005) Metamath! Pantheon Books, New YorkGoogle Scholar
  18. Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New YorkGoogle Scholar
  19. Forsdyke DR (1996) Different biological species ‘brodcast’ their DNA at different (G+C)% “wavelengths”. J. Theor. Biol. 178: 405–417.PubMedCrossRefGoogle Scholar
  20. Forsdyke DR (1981) Are introns in-series error-detecting sequences? J. Theor. Biol. 93: 861–866PubMedCrossRefGoogle Scholar
  21. Forsdyke DR (1995) Conservation of stem-loop potential in introns of snake venom phospholipase A 2 genes. An application of FORS-D analysis. Mol. Biol. and Evol. 12: 1157–1165Google Scholar
  22. Gallager RG (1968) Information theory and reliable communication. Wiley, New YorkGoogle Scholar
  23. Guizzo Erico (2004) Closing in on the perfect code. IEEE Spectrum INT-41(3): 28–34, March 2004Google Scholar
  24. Landauer R (1991) Information is physical. Physics Today. May 1991, 23–29Google Scholar
  25. Lolle SJ, Victor JL, Young JM, Pruitt RE (2005) Genome-wide non-mendelian inheritance of extragenomic information in Arabidopsis. Nature 434(7032): 505–509, March 24, 2005PubMedCrossRefGoogle Scholar
  26. Maynard Smith J, Szathmáry E (1995) The major transitions in evolution. Oxford University Press, Oxford, UKGoogle Scholar
  27. Ridley Mark (2000) Mendel’s demon: [gene justice and the complexity of life]. Weidenfeld & Nicholson, LondonGoogle Scholar
  28. Servien Pius (1931) Le langage des sciences. Payot, ParisGoogle Scholar
  29. Shannon CE (1948) A mathematical theory of communication. BSTJ 27: 379–457, 623–656, July and October 1948. These papers have been reprinted, with comments by Weaver W as a book entitled The mathematical theory of communication. University of Illinois Press, Chicago 1949Google Scholar
  30. Voss RF (1992) Evolution of long-range fractal correlation and 1/f noise in DNA base sequences. Phys. Rev. Lett. 68: 3805–3808, June 1992PubMedCrossRefGoogle Scholar
  31. Yockey HP (2005) Information theory, evolution, and the origin of life. Cambridge University Press, New YorkGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  • GÉRard Battail
    • 1
  1. 1.E.N.S.T.ParisFrance(retired)

Personalised recommendations