Abstract
This study combines biology and mathematics, showing that a relatively simple question from molecular biology can lead to complicated mathematics. The question is how to calculate the number of theoretically possible aliphatic amino acids as a function of the number of carbon atoms in the side chain. The presented calculation is based on earlier results from theoretical chemistry concerning alkyl compounds. Mathematical properties of this number series are highlighted. We discuss which of the theoretically possible structures really occur in living organisms, such as leucine and isoleucine with a chain length of four. This is done both for a strict definition of aliphatic amino acids only involving carbon and hydrogen atoms in their side chain and for a less strict definition allowing sulphur, nitrogen and oxygen atoms. While the main focus is on proteinogenic amino acids, we also give several examples of non-proteinogenic aliphatic amino acids, playing a role, for instance, in signalling. The results are in agreement with a general phenomenon found in biology: Usually, only a small number of molecules are chosen as building blocks to assemble an inconceivable number of different macromolecules as proteins. Thus, natural biological complexity arises from the multifarious combination of building blocks.
Similar content being viewed by others
References
Alberts B, Johnson A, Walter P, Lewis J, Raff M, Roberts K (2007) Molecular biology of the cell, 5th edn. Taylor & Francis, London
Balaban AT, Kennedy JW, Quintas LV (1988) The number of alkanes having n carbons and a longest chain of length d. J Chem Educ 65:304–313
Berg JM, Tymoczko JL, Stryer L (2002) Biochemistry, 5th edn. Freeman, New York
Cayley A (1874) On the mathematical theory of isomers. Phil Mag 67:444–447
Colley KJ, Baenziger JU (1987) Identification of the post-translational modifications of the core-specific lectin. The core-specific lectin contains hydroxyproline, hydroxylysine, and glucosylgalactosylhydroxylysine residues. J Biol Chem 262:10290–10295
Crick FHC (1968) The origin of the genetic code. J Mol Biol 38:367–379
Fowden L, Smith A (1968) Newly characterized amino acids from Aesculus californica. Phytochemistry 7:809–819
Gabius HJ (ed) (2009) The sugar code: fundamentals of glycosciences. Wiley VCH, Weinheim
Henze HR, Blair CM (1931) The number of structurally isomeric alcohols of the methanol series. J Am Chem Soc 53:3042–3046
Ivanova V, Oriol M, Montes MJ, García A, Guinea J (2001) Secondary metabolites from a Streptomyces strain isolated from Livingston Island, Antarctica. Z Naturforsch C 56:1–5
Kannicht C (2002) Posttranslational modifications of proteins: tools for functional proteomics, 1st ed. In: Kannicht C (ed) Methods in molecular biology, vol 194. Humana, Totowa
Karas V (1954) Systematization of amino acids according to the increasing number of carbon atoms in the main aliphatic chain. Farm Glas 10:138–153 (in Croatian)
Kikuchi T, Kadota S, Hanagaki S, Suehara H, Namba T, Lin CC, Kan WS (1981) Studies on the constituents of orchidaceous plants. I. Constituents of Nervilia purpurea Schlechter and Nervilia aragoana Gaud. Chem Pharm Bull 29:2073–2078
Kim JK, Samaranayake M, Pradhan S (2009) Epigenetic mechanisms in mammals. Cell Mol Life Sci 66:596–612
Lee BJ, Worland PJ, Davis JN, Stadtman TC, Hatfield DL (1989) Identification of a selenocysteyl-tRNA(Ser) in mammalian cells that recognizes the nonsense codon, UGA. J Biol Chem 264:9724–9727
Leinfelder W, Stadtman TC, Böck A (1989) Occurrence in vivo of selenocysteyl-tRNA (SERUCA) in Escherichia coli. Effect of sel mutations. J Biol Chem 264:9720–9723
Lodish H, Berk A, Zipursky SL, Matsudaira P, Baltimore D, Darnell J (2000) Molecular cell biology, 4th edn. Freeman, New York
Miller EJ, Robertson PB (1973) The stability of collagen cross-links when derived from hydroxylsyl residues. Biochem Biophys Res Commun 54:432–439
Miller SL, Urey HC (1959) Organic compound synthesis on the primitive earth. Science 130:245–251
Ming XF, Rajapakse AG, Carvas JM, Ruffieux J, Yang Z (2009) Inhibition of S6K1 accounts partially for the anti-inflammatory effects of the arginase inhibitor L-norvaline. BMC Cardiovasc Disord 9:12–18
Otter R (1948) The number of trees. Ann Mathem 49:583–599
Pólya G (1937) Kombinatorische Anzahlbestimmungen für Gruppen, Graphen und chemische Verbindungen. Acta Math 68:145–254
Riga E, Perry RN, Barrett J, Johnston MRL (1997) Electrophysiological responses of male potato cyst nematodes, Globodera rostochiensis and G. pallida, to some chemicals. J Chem Ecol 23:417–428
Seligmann H (2003) Cost-minimization of amino acid usage. J Mol Evol 56:151–161
Srinivasan G, James CM, Krzycki JA (2002) Pyrrolysine encoded by UAG in Archaea: charging of a UAG-decoding specialized tRNA. Science 296:1459–1462
Taylor WR (1986) The classification of amino acid conservation. J Theor Biol 119:205–218
Tsantrizos YS, Pischos S, Sauriol F (1996) Structural assignment of the peptide antibiotic LP237-F8, a metabolite of Tolypocladium geodes. J Organ Chem 61:2118–2121
Wilf H (1994) Generating functionology, 2nd edn. Academic, Boston
Acknowledgements
We kindly thank Gunnar Brinkmann from the University of Gent, Belgium, for suggestions about the very first ideas for this manuscript. Further acknowledgements go to Dr Ina Weiß and Heike Göbel for literature search. We also thank Christian Bodenstein who inspired us to the idea of plotting the carbon ‘investment’ (see Fig. 4).
Conflict of interests
The authors declare that they have no conflict of interest.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
There are (at least) two ways to prove the recurrence formula (4). The “elegant” way uses Pólya’s Enumeration Theorem (Pólya 1937) (also known as Redfield-Pólya’s Theorem) to do so: Here, we first notice that instead of counting rooted trees with n nodes and at most three children at every node, we can also count rooted trees with exactly three children at every node, and n internal (non-leaf) vertices. To see this, we simply complete a rooted tree by attaching up to three leaves to every node of the original tree, so that all vertices except the new leaves have degree three (Fig. 6). Such trees are called rooted ternary trees. Then, we note that the symmetry between such trees is due to arbitrarily sorting the children of any node. The mathematical representation of this fact is the symmetric group S3. One proceeds by computing the cycle index of S3, which is
It is helpful to consider the generating function \( T(z) \) for the number of rooted ternary trees, which is defined as \( T(z) = {x_0}{z^0} + {x_1}{z^1} + {x_2}{z^2} + \ldots \), where x n are exactly the numbers introduced above. Pólya’s Enumeration Theorem (Pólya 1937) then tells us that this function fulfils the functional equation
With the functional equation, we can now calculate the coefficient of any power \( {z^n} \) in the generating function: for example, regarding \( T\left( {{z^3}} \right) \), the three coefficients must add up to n − 1. Doing so, we directly reach Eq. 4. We omit all further detail and refer the reader to any textbook about generating functions (e.g. Wilf 1994).
Now, we show a direct way to prove Eq. 4. Before doing so, we note that a slightly more complicated way of computing \( {x_n} \), which ultimately also results in Eq. 4, was described by Henze and Blair in 1931. We simplify their presentation to calculate x n as follows: To any node, we may attach three trees such that subtrees have pairwise different numbers of nodes. We explicitly allow that a tree has zero nodes, in which case we attach nothing—recall that we have defined x 0 = 1 above. For this case, we do not have to take into account symmetry considerations, since the subtrees must be pairwise different. So, we have
possibilities of doing so. Next, assume that we attach two trees of the same size j, and a third tree of size i. There exist \( \left( {\begin{array}{*{20}{c}} {{x_j} + 1} \\2 \\\end{array} } \right) = \frac{1}{2}\left( {{x_j} + 1} \right){x_j} \) ways to choose two trees of size j, since this is a combination with repetition. We calculate the number of tree as
Finally, all trees may have size \( {x_i} \). In this case, there exist
ways to choose three trees of size i. We calculate
Note that if n − 1 is not divisible by three, then all of these sums are empty and, by definition, equal zero. Now, we add these three values, and sort them: First, we put all sums of products \( {x_i}{x_j}{x_k} \). One can easily see that this equals \( \frac{1}{6}\sum\limits_{{i + j + k = n - 1}} {{x_i}{x_j}{x_k}} \) as desired: For the first sum on the right-hand side of Eq. 10, there are three possibilities of choosing two indices from i, j, k to be equal, as we can permute the indices. Similarly, we can reproduce the second summand of Eq. 4. Finally, the third summand directly comes from Eq. 12, which completes the proof.
Rights and permissions
About this article
Cite this article
Grützmann, K., Böcker, S. & Schuster, S. Combinatorics of aliphatic amino acids. Naturwissenschaften 98, 79–86 (2011). https://doi.org/10.1007/s00114-010-0743-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00114-010-0743-2