## Abstract

It is well known that sequences of bases in DNA are translated into sequences of amino acids in cells via the *genetic code*. More recently, it has been discovered that the sequence of DNA bases also influences the geometry and deformability of the DNA. These two correspondences represent a naturally arising example of *duplexed codes*, providing two different ways of interpreting the same DNA sequence. This paper will set up the notation and basic results necessary to mathematically investigate the relationship between these two natural DNA codes. It then undertakes two very different such investigations: one graphical approach based only on expected values and another analytic approach incorporating the deformability of the DNA molecule and approximating the mutual information of the two codes. Special emphasis is paid to whether there is evidence that pressure to maximize the duplexing efficiency influenced the evolution of the genetic code. Disappointingly, the results fail to support the hypothesis that the genetic code was influenced in this way. In fact, applying both methods to samples of realistic alternative genetic codes shows that the duplexing of the genetic code found in nature is just slightly *less* efficient than average. The implications of this negative result are considered in the final section of the paper.

### Similar content being viewed by others

## Notes

Of course, Alice can similarly send a

*string*of elements from \(\mathcal {S}\) to encode a string of elements from \(\mathcal {X}\), but all of the information about the code is contained in the map*f*acting on single elements and so the idea of using longer strings will largely be ignored below.The specific size of the rectangle will never be specified. All that matters is that they are small enough that they do not intersect.

Because the trigonometric functions are not one-to-one, one must always be careful when working with their “inverses”, which are actually only inverses on specified intervals. The Tait–Bryan system of coordinates for angles is being used in this application because it places the endpoints of those intervals far from the orientations that appear in codon geometry. (In contrast, some of the other standard coordinate systems would assign very different angle coordinates to the geometry that bends a tiny amount in one direction from the vertical and one that bends a tiny amount in another directions).

Unfortunately, angles in the Hassan–Calladine parameters are measured in degrees rather than radians.

The value of \(\phi \) has no effect when \(\theta _1^2+\theta _2^2=0\). It is merely for convenience that we set \(\phi =0\) in that case.

Here, we are making an assumption of

*normality*. It does seem likely that the probability distribution for each parameter is approximately normal and centered a the expected value. However, even if that is not the case, so long as we are considering the average position for a large number of observations then the assumption should be valid by the Central Limit Theorem.\(\text {diag}(z_1,\ldots ,z_6)\) denotes the diagonal matrix with the scalar \(z_i\) in the

*i**th*position on the diagonal. Then \(\text {diag}(z_1,\ldots ,z_6){\hat{\mathbf{s}}}(d)\) is a vector whose entries are the standard deviations of the Hassan–Calladine parameters for the dimer*d*each scaled by the corresponding*z*-score.Since the values of the Tait–Bryan angles are so limited in range, in this application, the global topology of

*SO*(3) can be ignored and distances computed merely as if these were points in \(\mathbb {R}^3\). The*Mathematica*notebook (“http://kasmana.people.cofc.edu/DNAGeometry/”) is used to generate the figures and perform the calculations that also includes code to use a more sophisticated definition that forms an actual metric on the space of rotations, but the results were essentially the same and are not worth the extra complexity in notation required to introduce the necessary definitions.These graphs are 3-dimensional. The figures included in this journal article are, by necessity, 2-dimensional projections. Since the third coordinate varies very little as compared to the other two, the projection from above has been selected and distances in the figure should be relatively accurate. Still, it is important to note that no such projection was used in the computation of the total length.

The alternative genetic codes considered by Itzkovitz and Alon are a subclass of the ones considered in this paper. In particular, they only consider the ones which can be produced by the composition of a permutation of the first bases, a permutation of the second bases, and a permutation in the third bases.

## References

Alexander RW, Schimmel P (2001) Wobble hypothesis. In: Brenner S, Miller JH (eds) Encyclopedia of genetics. Elsevier, Amsterdam

Barrell BG, Bankier AT, Drouin J (1979) A different genetic code in human mitochondria. Nature 282:189–194

Berg JM, Tymoczko JL, Stryer L (2002) Biochemistry, 5th edn. WH Freeman, New York. Section 5.5.1

Eslami-Mossallam B, Schram RD, Tompitak M, van Noort John, Schiessel H, (2016) Multiplexing genetic and nucleosome positioning codes: a computational approach. PLoS One 11(6):e0156905. https://doi.org/10.1371/journal.pone.0156905

Fujii S, Kono H, Takenaka S, Go N, Sarai A (2007) Sequence-dependent DNA deformability studied using molecular dynamics simulations. Nucleic Acids Res 35(18):6063–6074

Hassan MA, Calladine CR (1995) The assessment of the geometry of dinucleotide steps in double-helical DNA; a new local calculation scheme. J Mol Biol 251:648–664

Itzkovitz S, Alon U (2007) The genetic code is nearly optimal for allowing additional information within protein-coding sequences. Genom Res 17(4):405–12 Epub 2007 Feb 9

Kawaguchi Y, Honda H, Taniguchi-Morimura J, Iwasaki S (1989) The codon CUG is read as serine in an asporogenic yeast Candida cylindracea. Nature 341:164–166

Kiga D, Sakamoto K, Kodama K, Kigawa T, Matsuda T, Yabuki T, Shirouzu M, Harada Y, Nakayama H, Takio K (2002) An engineered Escherichia coli tyrosyl-tRNA synthetase for site-specific incorporation of an unnatural amino acid into proteins in eukaryotic translation and its application in a wheat germ cell-free system. Proc Natl Acad Sci USA 99:9715–9720

Koonin EV, Novozhilov AS (2017) Origin and evolution of the universal genetic code. Annu Rev Genet 51:4562

Kumara B, Saini S (2016) Analysis of the optimality of the standard genetic code. Mol BioSyst 12:2642–2651

Lajoie MJ, Söll D, Church GM (2016) Overcoming challenges in engineering the genetic code. J Mol Biol 428(5 Pt B):10041021

Lankas F, Sponer J, Langowski J, Thomas E (2003) Cheatham III. DNA basepair step deformability inferred from molecular dynamics simulations. Biophys J 85:2872–2883

Liu CC, Schultz PG (2010) Adding new chemistries to the genetic code. Annu Rev Biochem 79:413–444

Matsumoto A, Olson WK (2002) Sequence-dependent motions of DNA: a normal mode analysis at the base-pair level. Biophys J 83:22–41

Olson WK, Gorin AA, Xiang-Jun L, Hock LM, Zhurkin Victor B (1998) DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. Proc Natl Acad Sci USA 95:11163–11168

Rohs R, West SM, Sosinsky A et al (2009) The role of DNA shape in protein-DNA recognition. Nature 461(7268):1248–1281

Srinivasan G, James CM (2002) Pyrrolysine encoded by UAG in Archaea. Science 296(5572):1459–1462

Wang L, Brock A, Herberich B, Schultz PG (2001) Expanding the genetic code of Escherichia coli. Science 292:498–500

Yamao F, Muto A, Kawauchi Y, Iwami M, Iwagami S, Azumi Y, Osawa S (1985) UGA is read as tryptophan in Mycoplasma capricolum. Proc Natl Acad Sci USA 82:2306–2309

Zhang Z, Yu J (2011) On the organizational dynamics of the genetic code. Genom Proteomics Bioinform 9(1–2):21–29

## Acknowledgements

I am grateful to Jason Cantarella (University of Georgia), Madison Hyer (Medical University of South Carolina), Martin Jones (College of Charleston), Brenton Lemesurier (College of Charleston), Garrett Mitchener (College of Charleston), and Laura Kasman (Medical University of South Carolina) for helpful discussion and feedback. I would also like to thank Wilma Olson and the organizers of the Thematic Year on Mathematics of Molecular and Cellular Biology at the IMA where I met her and first learned about the sequence-dependent geometry of DNA.

## Author information

### Authors and Affiliations

### Corresponding author

## Rights and permissions

## About this article

### Cite this article

Kasman, A. The Duplexing of the Genetic Code and Sequence-Dependent DNA Geometry.
*Bull Math Biol* **80**, 2734–2760 (2018). https://doi.org/10.1007/s11538-018-0486-3

Received:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s11538-018-0486-3