The Duplexing of the Genetic Code and Sequence-Dependent DNA Geometry


It is well known that sequences of bases in DNA are translated into sequences of amino acids in cells via the genetic code. More recently, it has been discovered that the sequence of DNA bases also influences the geometry and deformability of the DNA. These two correspondences represent a naturally arising example of duplexed codes, providing two different ways of interpreting the same DNA sequence. This paper will set up the notation and basic results necessary to mathematically investigate the relationship between these two natural DNA codes. It then undertakes two very different such investigations: one graphical approach based only on expected values and another analytic approach incorporating the deformability of the DNA molecule and approximating the mutual information of the two codes. Special emphasis is paid to whether there is evidence that pressure to maximize the duplexing efficiency influenced the evolution of the genetic code. Disappointingly, the results fail to support the hypothesis that the genetic code was influenced in this way. In fact, applying both methods to samples of realistic alternative genetic codes shows that the duplexing of the genetic code found in nature is just slightly less efficient than average. The implications of this negative result are considered in the final section of the paper.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2


  1. 1.

    Of course, Alice can similarly send a string of elements from \(\mathcal {S}\) to encode a string of elements from \(\mathcal {X}\), but all of the information about the code is contained in the map f acting on single elements and so the idea of using longer strings will largely be ignored below.

  2. 2.

    The specific size of the rectangle will never be specified. All that matters is that they are small enough that they do not intersect.

  3. 3.

    Because the trigonometric functions are not one-to-one, one must always be careful when working with their “inverses”, which are actually only inverses on specified intervals. The Tait–Bryan system of coordinates for angles is being used in this application because it places the endpoints of those intervals far from the orientations that appear in codon geometry. (In contrast, some of the other standard coordinate systems would assign very different angle coordinates to the geometry that bends a tiny amount in one direction from the vertical and one that bends a tiny amount in another directions).

  4. 4.

    Unfortunately, angles in the Hassan–Calladine parameters are measured in degrees rather than radians.

  5. 5.

    The value of \(\phi \) has no effect when \(\theta _1^2+\theta _2^2=0\). It is merely for convenience that we set \(\phi =0\) in that case.

  6. 6.

    Here, we are making an assumption of normality. It does seem likely that the probability distribution for each parameter is approximately normal and centered a the expected value. However, even if that is not the case, so long as we are considering the average position for a large number of observations then the assumption should be valid by the Central Limit Theorem.

  7. 7.

    \(\text {diag}(z_1,\ldots ,z_6)\) denotes the diagonal matrix with the scalar \(z_i\) in the i th position on the diagonal. Then \(\text {diag}(z_1,\ldots ,z_6){\hat{\mathbf{s}}}(d)\) is a vector whose entries are the standard deviations of the Hassan–Calladine parameters for the dimer d each scaled by the corresponding z-score.

  8. 8.

    Since the values of the Tait–Bryan angles are so limited in range, in this application, the global topology of SO(3) can be ignored and distances computed merely as if these were points in \(\mathbb {R}^3\). The Mathematica notebook (“”) is used to generate the figures and perform the calculations that also includes code to use a more sophisticated definition that forms an actual metric on the space of rotations, but the results were essentially the same and are not worth the extra complexity in notation required to introduce the necessary definitions.

  9. 9.

    These graphs are 3-dimensional. The figures included in this journal article are, by necessity, 2-dimensional projections. Since the third coordinate varies very little as compared to the other two, the projection from above has been selected and distances in the figure should be relatively accurate. Still, it is important to note that no such projection was used in the computation of the total length.

  10. 10.

    The alternative genetic codes considered by Itzkovitz and Alon are a subclass of the ones considered in this paper. In particular, they only consider the ones which can be produced by the composition of a permutation of the first bases, a permutation of the second bases, and a permutation in the third bases.


  1. Alexander RW, Schimmel P (2001) Wobble hypothesis. In: Brenner S, Miller JH (eds) Encyclopedia of genetics. Elsevier, Amsterdam

    Google Scholar 

  2. Barrell BG, Bankier AT, Drouin J (1979) A different genetic code in human mitochondria. Nature 282:189–194

    Article  Google Scholar 

  3. Berg JM, Tymoczko JL, Stryer L (2002) Biochemistry, 5th edn. WH Freeman, New York. Section 5.5.1

  4. Eslami-Mossallam B, Schram RD, Tompitak M, van Noort John, Schiessel H, (2016) Multiplexing genetic and nucleosome positioning codes: a computational approach. PLoS One 11(6):e0156905.

    Article  Google Scholar 

  5. Fujii S, Kono H, Takenaka S, Go N, Sarai A (2007) Sequence-dependent DNA deformability studied using molecular dynamics simulations. Nucleic Acids Res 35(18):6063–6074

    Article  Google Scholar 

  6. Hassan MA, Calladine CR (1995) The assessment of the geometry of dinucleotide steps in double-helical DNA; a new local calculation scheme. J Mol Biol 251:648–664

    Article  Google Scholar 

  7. Itzkovitz S, Alon U (2007) The genetic code is nearly optimal for allowing additional information within protein-coding sequences. Genom Res 17(4):405–12 Epub 2007 Feb 9

    Article  Google Scholar 

  8. Kawaguchi Y, Honda H, Taniguchi-Morimura J, Iwasaki S (1989) The codon CUG is read as serine in an asporogenic yeast Candida cylindracea. Nature 341:164–166

    Article  Google Scholar 

  9. Kiga D, Sakamoto K, Kodama K, Kigawa T, Matsuda T, Yabuki T, Shirouzu M, Harada Y, Nakayama H, Takio K (2002) An engineered Escherichia coli tyrosyl-tRNA synthetase for site-specific incorporation of an unnatural amino acid into proteins in eukaryotic translation and its application in a wheat germ cell-free system. Proc Natl Acad Sci USA 99:9715–9720

    Article  Google Scholar 

  10. Koonin EV, Novozhilov AS (2017) Origin and evolution of the universal genetic code. Annu Rev Genet 51:4562

    Article  Google Scholar 

  11. Kumara B, Saini S (2016) Analysis of the optimality of the standard genetic code. Mol BioSyst 12:2642–2651

    Article  Google Scholar 

  12. Lajoie MJ, Söll D, Church GM (2016) Overcoming challenges in engineering the genetic code. J Mol Biol 428(5 Pt B):10041021

    Article  Google Scholar 

  13. Lankas F, Sponer J, Langowski J, Thomas E (2003) Cheatham III. DNA basepair step deformability inferred from molecular dynamics simulations. Biophys J 85:2872–2883

    Article  Google Scholar 

  14. Liu CC, Schultz PG (2010) Adding new chemistries to the genetic code. Annu Rev Biochem 79:413–444

    Article  Google Scholar 

  15. Matsumoto A, Olson WK (2002) Sequence-dependent motions of DNA: a normal mode analysis at the base-pair level. Biophys J 83:22–41

    Article  Google Scholar 

  16. Olson WK, Gorin AA, Xiang-Jun L, Hock LM, Zhurkin Victor B (1998) DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. Proc Natl Acad Sci USA 95:11163–11168

    Article  Google Scholar 

  17. Rohs R, West SM, Sosinsky A et al (2009) The role of DNA shape in protein-DNA recognition. Nature 461(7268):1248–1281

    Article  Google Scholar 

  18. Srinivasan G, James CM (2002) Pyrrolysine encoded by UAG in Archaea. Science 296(5572):1459–1462

    Article  Google Scholar 

  19. Wang L, Brock A, Herberich B, Schultz PG (2001) Expanding the genetic code of Escherichia coli. Science 292:498–500

    Article  Google Scholar 

  20. Yamao F, Muto A, Kawauchi Y, Iwami M, Iwagami S, Azumi Y, Osawa S (1985) UGA is read as tryptophan in Mycoplasma capricolum. Proc Natl Acad Sci USA 82:2306–2309

    Article  Google Scholar 

  21. Zhang Z, Yu J (2011) On the organizational dynamics of the genetic code. Genom Proteomics Bioinform 9(1–2):21–29

    Article  Google Scholar 

Download references


I am grateful to Jason Cantarella (University of Georgia), Madison Hyer (Medical University of South Carolina), Martin Jones (College of Charleston), Brenton Lemesurier (College of Charleston), Garrett Mitchener (College of Charleston), and Laura Kasman (Medical University of South Carolina) for helpful discussion and feedback. I would also like to thank Wilma Olson and the organizers of the Thematic Year on Mathematics of Molecular and Cellular Biology at the IMA where I met her and first learned about the sequence-dependent geometry of DNA.

Author information



Corresponding author

Correspondence to Alex Kasman.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kasman, A. The Duplexing of the Genetic Code and Sequence-Dependent DNA Geometry. Bull Math Biol 80, 2734–2760 (2018).

Download citation


  • Genetic code
  • DNA geometry
  • Mutual information
  • Multiplexing
  • Codons