## Abstract

This study combines biology and mathematics, showing that a relatively simple question from molecular biology can lead to complicated mathematics. The question is how to calculate the number of theoretically possible aliphatic amino acids as a function of the number of carbon atoms in the side chain. The presented calculation is based on earlier results from theoretical chemistry concerning alkyl compounds. Mathematical properties of this number series are highlighted. We discuss which of the theoretically possible structures really occur in living organisms, such as leucine and isoleucine with a chain length of four. This is done both for a strict definition of aliphatic amino acids only involving carbon and hydrogen atoms in their side chain and for a less strict definition allowing sulphur, nitrogen and oxygen atoms. While the main focus is on proteinogenic amino acids, we also give several examples of non-proteinogenic aliphatic amino acids, playing a role, for instance, in signalling. The results are in agreement with a general phenomenon found in biology: Usually, only a small number of molecules are chosen as building blocks to assemble an inconceivable number of different macromolecules as proteins. Thus, natural biological complexity arises from the multifarious combination of building blocks.

This is a preview of subscription content, access via your institution.

## References

Alberts B, Johnson A, Walter P, Lewis J, Raff M, Roberts K (2007) Molecular biology of the cell, 5th edn. Taylor & Francis, London

Balaban AT, Kennedy JW, Quintas LV (1988) The number of alkanes having n carbons and a longest chain of length d. J Chem Educ 65:304–313

Berg JM, Tymoczko JL, Stryer L (2002) Biochemistry, 5th edn. Freeman, New York

Cayley A (1874) On the mathematical theory of isomers. Phil Mag 67:444–447

Colley KJ, Baenziger JU (1987) Identification of the post-translational modifications of the core-specific lectin. The core-specific lectin contains hydroxyproline, hydroxylysine, and glucosylgalactosylhydroxylysine residues. J Biol Chem 262:10290–10295

Crick FHC (1968) The origin of the genetic code. J Mol Biol 38:367–379

Fowden L, Smith A (1968) Newly characterized amino acids from

*Aesculus californica*. Phytochemistry 7:809–819Gabius HJ (ed) (2009) The sugar code: fundamentals of glycosciences. Wiley VCH, Weinheim

Henze HR, Blair CM (1931) The number of structurally isomeric alcohols of the methanol series. J Am Chem Soc 53:3042–3046

Ivanova V, Oriol M, Montes MJ, García A, Guinea J (2001) Secondary metabolites from a

*Streptomyces*strain isolated from Livingston Island, Antarctica. Z Naturforsch C 56:1–5Kannicht C (2002) Posttranslational modifications of proteins: tools for functional proteomics, 1st ed. In: Kannicht C (ed) Methods in molecular biology, vol 194. Humana, Totowa

Karas V (1954) Systematization of amino acids according to the increasing number of carbon atoms in the main aliphatic chain. Farm Glas 10:138–153 (in Croatian)

Kikuchi T, Kadota S, Hanagaki S, Suehara H, Namba T, Lin CC, Kan WS (1981) Studies on the constituents of orchidaceous plants. I. Constituents of

*Nervilia purpurea*Schlechter and*Nervilia aragoana*Gaud. Chem Pharm Bull 29:2073–2078Kim JK, Samaranayake M, Pradhan S (2009) Epigenetic mechanisms in mammals. Cell Mol Life Sci 66:596–612

Lee BJ, Worland PJ, Davis JN, Stadtman TC, Hatfield DL (1989) Identification of a selenocysteyl-tRNA(Ser) in mammalian cells that recognizes the nonsense codon, UGA. J Biol Chem 264:9724–9727

Leinfelder W, Stadtman TC, Böck A (1989) Occurrence in vivo of selenocysteyl-tRNA (SERUCA) in

*Escherichia coli.*Effect of sel mutations. J Biol Chem 264:9720–9723Lodish H, Berk A, Zipursky SL, Matsudaira P, Baltimore D, Darnell J (2000) Molecular cell biology, 4th edn. Freeman, New York

Miller EJ, Robertson PB (1973) The stability of collagen cross-links when derived from hydroxylsyl residues. Biochem Biophys Res Commun 54:432–439

Miller SL, Urey HC (1959) Organic compound synthesis on the primitive earth. Science 130:245–251

Ming XF, Rajapakse AG, Carvas JM, Ruffieux J, Yang Z (2009) Inhibition of S6K1 accounts partially for the anti-inflammatory effects of the arginase inhibitor L-norvaline. BMC Cardiovasc Disord 9:12–18

Otter R (1948) The number of trees. Ann Mathem 49:583–599

Pólya G (1937) Kombinatorische Anzahlbestimmungen für Gruppen, Graphen und chemische Verbindungen. Acta Math 68:145–254

Riga E, Perry RN, Barrett J, Johnston MRL (1997) Electrophysiological responses of male potato cyst nematodes,

*Globodera rostochiensis*and*G. pallida*, to some chemicals. J Chem Ecol 23:417–428Seligmann H (2003) Cost-minimization of amino acid usage. J Mol Evol 56:151–161

Srinivasan G, James CM, Krzycki JA (2002) Pyrrolysine encoded by UAG in

*Archaea*: charging of a UAG-decoding specialized tRNA. Science 296:1459–1462Taylor WR (1986) The classification of amino acid conservation. J Theor Biol 119:205–218

Tsantrizos YS, Pischos S, Sauriol F (1996) Structural assignment of the peptide antibiotic LP237-F8, a metabolite of

*Tolypocladium geodes*. J Organ Chem 61:2118–2121Wilf H (1994) Generating functionology, 2nd edn. Academic, Boston

## Acknowledgements

We kindly thank Gunnar Brinkmann from the University of Gent, Belgium, for suggestions about the very first ideas for this manuscript. Further acknowledgements go to Dr Ina Weiß and Heike Göbel for literature search. We also thank Christian Bodenstein who inspired us to the idea of plotting the carbon ‘investment’ (see Fig. 4).

### Conflict of interests

The authors declare that they have no conflict of interest.

## Author information

### Authors and Affiliations

### Corresponding author

## Appendix

### Appendix

There are (at least) two ways to prove the recurrence formula (4). The “elegant” way uses Pólya’s Enumeration Theorem (Pólya 1937) (also known as Redfield-Pólya’s Theorem) to do so: Here, we first notice that instead of counting rooted trees with *n* nodes and at most three children at every node, we can also count rooted trees with *exactly* three children at every node, and *n* internal (non-leaf) vertices. To see this, we simply complete a rooted tree by attaching up to three leaves to every node of the original tree, so that all vertices except the new leaves have degree three (Fig. 6). Such trees are called rooted *ternary* trees. Then, we note that the symmetry between such trees is due to arbitrarily sorting the children of any node. The mathematical representation of this fact is the symmetric group S_{3}. One proceeds by computing the cycle index of S_{3}, which is

It is helpful to consider the *generating function*
\( T(z) \) for the number of rooted ternary trees, which is defined as \( T(z) = {x_0}{z^0} + {x_1}{z^1} + {x_2}{z^2} + \ldots \), where *x*
_{
n
} are exactly the numbers introduced above. Pólya’s Enumeration Theorem (Pólya 1937) then tells us that this function fulfils the functional equation

With the functional equation, we can now calculate the coefficient of any power \( {z^n} \) in the generating function: for example, regarding \( T\left( {{z^3}} \right) \), the three coefficients must add up to *n* − 1. Doing so, we directly reach Eq. 4. We omit all further detail and refer the reader to any textbook about generating functions (e.g. Wilf 1994).

Now, we show a direct way to prove Eq. 4. Before doing so, we note that a slightly more complicated way of computing \( {x_n} \), which ultimately also results in Eq. 4, was described by Henze and Blair in 1931. We simplify their presentation to calculate *x*
_{
n
} as follows: To any node, we may attach three trees such that subtrees have pairwise different numbers of nodes. We explicitly allow that a tree has zero nodes, in which case we attach nothing—recall that we have defined *x*
_{0} = 1 above. For this case, we do not have to take into account symmetry considerations, since the subtrees must be pairwise different. So, we have

possibilities of doing so. Next, assume that we attach two trees of the same size *j*, and a third tree of size *i*. There exist \( \left( {\begin{array}{*{20}{c}} {{x_j} + 1} \\2 \\\end{array} } \right) = \frac{1}{2}\left( {{x_j} + 1} \right){x_j} \) ways to choose two trees of size *j*, since this is a combination with repetition. We calculate the number of tree as

Finally, all trees may have size \( {x_i} \). In this case, there exist

ways to choose three trees of size *i*. We calculate

Note that if *n* − 1 is not divisible by three, then all of these sums are empty and, by definition, equal zero. Now, we add these three values, and sort them: First, we put all sums of products \( {x_i}{x_j}{x_k} \). One can easily see that this equals \( \frac{1}{6}\sum\limits_{{i + j + k = n - 1}} {{x_i}{x_j}{x_k}} \) as desired: For the first sum on the right-hand side of Eq. 10, there are three possibilities of choosing two indices from *i*, *j*, *k* to be equal, as we can permute the indices. Similarly, we can reproduce the second summand of Eq. 4. Finally, the third summand directly comes from Eq. 12, which completes the proof.

## Rights and permissions

## About this article

### Cite this article

Grützmann, K., Böcker, S. & Schuster, S. Combinatorics of aliphatic amino acids.
*Naturwissenschaften* **98**, 79–86 (2011). https://doi.org/10.1007/s00114-010-0743-2

Received:

Revised:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s00114-010-0743-2

### Keywords

- Aliphatic amino acids
- Aliphatic side chain
- Amino acid signalling
- Enumeration of isomers
- Pólya’s enumeration theorem
- Ternary tree graphs