Skip to main content

Combinatorics of aliphatic amino acids

Abstract

This study combines biology and mathematics, showing that a relatively simple question from molecular biology can lead to complicated mathematics. The question is how to calculate the number of theoretically possible aliphatic amino acids as a function of the number of carbon atoms in the side chain. The presented calculation is based on earlier results from theoretical chemistry concerning alkyl compounds. Mathematical properties of this number series are highlighted. We discuss which of the theoretically possible structures really occur in living organisms, such as leucine and isoleucine with a chain length of four. This is done both for a strict definition of aliphatic amino acids only involving carbon and hydrogen atoms in their side chain and for a less strict definition allowing sulphur, nitrogen and oxygen atoms. While the main focus is on proteinogenic amino acids, we also give several examples of non-proteinogenic aliphatic amino acids, playing a role, for instance, in signalling. The results are in agreement with a general phenomenon found in biology: Usually, only a small number of molecules are chosen as building blocks to assemble an inconceivable number of different macromolecules as proteins. Thus, natural biological complexity arises from the multifarious combination of building blocks.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  • Alberts B, Johnson A, Walter P, Lewis J, Raff M, Roberts K (2007) Molecular biology of the cell, 5th edn. Taylor & Francis, London

    Google Scholar 

  • Balaban AT, Kennedy JW, Quintas LV (1988) The number of alkanes having n carbons and a longest chain of length d. J Chem Educ 65:304–313

    CAS  Article  Google Scholar 

  • Berg JM, Tymoczko JL, Stryer L (2002) Biochemistry, 5th edn. Freeman, New York

    Google Scholar 

  • Cayley A (1874) On the mathematical theory of isomers. Phil Mag 67:444–447

    Google Scholar 

  • Colley KJ, Baenziger JU (1987) Identification of the post-translational modifications of the core-specific lectin. The core-specific lectin contains hydroxyproline, hydroxylysine, and glucosylgalactosylhydroxylysine residues. J Biol Chem 262:10290–10295

    CAS  PubMed  Google Scholar 

  • Crick FHC (1968) The origin of the genetic code. J Mol Biol 38:367–379

    CAS  Article  PubMed  Google Scholar 

  • Fowden L, Smith A (1968) Newly characterized amino acids from Aesculus californica. Phytochemistry 7:809–819

    CAS  Article  Google Scholar 

  • Gabius HJ (ed) (2009) The sugar code: fundamentals of glycosciences. Wiley VCH, Weinheim

    Google Scholar 

  • Henze HR, Blair CM (1931) The number of structurally isomeric alcohols of the methanol series. J Am Chem Soc 53:3042–3046

    CAS  Article  Google Scholar 

  • Ivanova V, Oriol M, Montes MJ, García A, Guinea J (2001) Secondary metabolites from a Streptomyces strain isolated from Livingston Island, Antarctica. Z Naturforsch C 56:1–5

    CAS  PubMed  Google Scholar 

  • Kannicht C (2002) Posttranslational modifications of proteins: tools for functional proteomics, 1st ed. In: Kannicht C (ed) Methods in molecular biology, vol 194. Humana, Totowa

    Google Scholar 

  • Karas V (1954) Systematization of amino acids according to the increasing number of carbon atoms in the main aliphatic chain. Farm Glas 10:138–153 (in Croatian)

    CAS  Google Scholar 

  • Kikuchi T, Kadota S, Hanagaki S, Suehara H, Namba T, Lin CC, Kan WS (1981) Studies on the constituents of orchidaceous plants. I. Constituents of Nervilia purpurea Schlechter and Nervilia aragoana Gaud. Chem Pharm Bull 29:2073–2078

    CAS  Google Scholar 

  • Kim JK, Samaranayake M, Pradhan S (2009) Epigenetic mechanisms in mammals. Cell Mol Life Sci 66:596–612

    CAS  Article  PubMed  Google Scholar 

  • Lee BJ, Worland PJ, Davis JN, Stadtman TC, Hatfield DL (1989) Identification of a selenocysteyl-tRNA(Ser) in mammalian cells that recognizes the nonsense codon, UGA. J Biol Chem 264:9724–9727

    CAS  PubMed  Google Scholar 

  • Leinfelder W, Stadtman TC, Böck A (1989) Occurrence in vivo of selenocysteyl-tRNA (SERUCA) in Escherichia coli. Effect of sel mutations. J Biol Chem 264:9720–9723

    CAS  PubMed  Google Scholar 

  • Lodish H, Berk A, Zipursky SL, Matsudaira P, Baltimore D, Darnell J (2000) Molecular cell biology, 4th edn. Freeman, New York

    Google Scholar 

  • Miller EJ, Robertson PB (1973) The stability of collagen cross-links when derived from hydroxylsyl residues. Biochem Biophys Res Commun 54:432–439

    CAS  Article  PubMed  Google Scholar 

  • Miller SL, Urey HC (1959) Organic compound synthesis on the primitive earth. Science 130:245–251

    CAS  Article  PubMed  Google Scholar 

  • Ming XF, Rajapakse AG, Carvas JM, Ruffieux J, Yang Z (2009) Inhibition of S6K1 accounts partially for the anti-inflammatory effects of the arginase inhibitor L-norvaline. BMC Cardiovasc Disord 9:12–18

    Article  PubMed  Google Scholar 

  • Otter R (1948) The number of trees. Ann Mathem 49:583–599

    Article  Google Scholar 

  • Pólya G (1937) Kombinatorische Anzahlbestimmungen für Gruppen, Graphen und chemische Verbindungen. Acta Math 68:145–254

    Article  Google Scholar 

  • Riga E, Perry RN, Barrett J, Johnston MRL (1997) Electrophysiological responses of male potato cyst nematodes, Globodera rostochiensis and G. pallida, to some chemicals. J Chem Ecol 23:417–428

    CAS  Article  Google Scholar 

  • Seligmann H (2003) Cost-minimization of amino acid usage. J Mol Evol 56:151–161

    CAS  Article  PubMed  Google Scholar 

  • Srinivasan G, James CM, Krzycki JA (2002) Pyrrolysine encoded by UAG in Archaea: charging of a UAG-decoding specialized tRNA. Science 296:1459–1462

    CAS  Article  PubMed  Google Scholar 

  • Taylor WR (1986) The classification of amino acid conservation. J Theor Biol 119:205–218

    CAS  Article  PubMed  Google Scholar 

  • Tsantrizos YS, Pischos S, Sauriol F (1996) Structural assignment of the peptide antibiotic LP237-F8, a metabolite of Tolypocladium geodes. J Organ Chem 61:2118–2121

    CAS  Article  Google Scholar 

  • Wilf H (1994) Generating functionology, 2nd edn. Academic, Boston

    Google Scholar 

Download references

Acknowledgements

We kindly thank Gunnar Brinkmann from the University of Gent, Belgium, for suggestions about the very first ideas for this manuscript. Further acknowledgements go to Dr Ina Weiß and Heike Göbel for literature search. We also thank Christian Bodenstein who inspired us to the idea of plotting the carbon ‘investment’ (see Fig. 4).

Conflict of interests

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Konrad Grützmann.

Appendix

Appendix

There are (at least) two ways to prove the recurrence formula (4). The “elegant” way uses Pólya’s Enumeration Theorem (Pólya 1937) (also known as Redfield-Pólya’s Theorem) to do so: Here, we first notice that instead of counting rooted trees with n nodes and at most three children at every node, we can also count rooted trees with exactly three children at every node, and n internal (non-leaf) vertices. To see this, we simply complete a rooted tree by attaching up to three leaves to every node of the original tree, so that all vertices except the new leaves have degree three (Fig. 6). Such trees are called rooted ternary trees. Then, we note that the symmetry between such trees is due to arbitrarily sorting the children of any node. The mathematical representation of this fact is the symmetric group S3. One proceeds by computing the cycle index of S3, which is

$$ Z({{\hbox{S}}_3}) = \frac{1}{6}\left( {{a_1}^3 + 3{a_1}{a_2} + 2{a_3}} \right). $$
Fig. 6
figure 6

Illustration of the replacement of rooted trees with n nodes and at most three children at every node (a) by rooted trees with exactly three children at every node, and n internal (non-leaf) vertices (b). The nodes of the original tree (circles) become internal nodes of the new tree by adding new leaves (squares)

It is helpful to consider the generating function \( T(z) \) for the number of rooted ternary trees, which is defined as \( T(z) = {x_0}{z^0} + {x_1}{z^1} + {x_2}{z^2} + \ldots \), where x n are exactly the numbers introduced above. Pólya’s Enumeration Theorem (Pólya 1937) then tells us that this function fulfils the functional equation

$$ T(z) = 1 + \frac{1}{6}z\,\left[ {T{{(z)}^3} + 3T(z)T\left( {{z^2}} \right) + 2T\left( {{z^3}} \right)} \right] $$
(8)

With the functional equation, we can now calculate the coefficient of any power \( {z^n} \) in the generating function: for example, regarding \( T\left( {{z^3}} \right) \), the three coefficients must add up to n − 1. Doing so, we directly reach Eq. 4. We omit all further detail and refer the reader to any textbook about generating functions (e.g. Wilf 1994).

Now, we show a direct way to prove Eq. 4. Before doing so, we note that a slightly more complicated way of computing \( {x_n} \), which ultimately also results in Eq. 4, was described by Henze and Blair in 1931. We simplify their presentation to calculate x n as follows: To any node, we may attach three trees such that subtrees have pairwise different numbers of nodes. We explicitly allow that a tree has zero nodes, in which case we attach nothing—recall that we have defined x 0 = 1 above. For this case, we do not have to take into account symmetry considerations, since the subtrees must be pairwise different. So, we have

$$ \sum\limits_{{\begin{array}{*{20}{c}} {1 + j + k = n - 1} \\{i < j < k} \\\end{array} }} {{x_i}{x_j}{x_k} = \frac{1}{6}} \sum\limits_{{\begin{array}{*{20}{c}} {i + j + k = n - 1} \\{i \ne j,j \ne k,k \ne i} \\\end{array} }} {{x_i}{x_j}{x_k}} $$
(9)

possibilities of doing so. Next, assume that we attach two trees of the same size j, and a third tree of size i. There exist \( \left( {\begin{array}{*{20}{c}} {{x_j} + 1} \\2 \\\end{array} } \right) = \frac{1}{2}\left( {{x_j} + 1} \right){x_j} \) ways to choose two trees of size j, since this is a combination with repetition. We calculate the number of tree as

$$ \sum\limits_{{\begin{array}{*{20}{c}} {i \ne j} \\{i + 2j = n - 1} \\\end{array} }} {{x_i}\left( {\begin{array}{*{20}{c}} {{x_{{j + 1}}}} \\2 \\\end{array} } \right) = \frac{1}{2}\left( {\sum\limits_{{\begin{array}{*{20}{c}} {i \ne j = k} \\{i + j + k = n - 1} \\\end{array} }} {{x_i}{x_j}{x_k} + \sum\limits_{{\begin{array}{*{20}{c}} {i \ne j} \\{i + 2j = n - 1} \\\end{array} }} {{x_i}{x_j}} } } \right)} $$
(10)

Finally, all trees may have size \( {x_i} \). In this case, there exist

$$ \left( {\begin{array}{*{20}{c}} {{x_i} + 2} \\3 \\\end{array} } \right) = \frac{1}{6}\left( {{x_i} + 2} \right)\left( {{x_i} + 1} \right){x_i} = \frac{1}{6}\left( {{x_i}^3 + 3{x_i}^2 + 2{x_i}} \right) $$
(11)

ways to choose three trees of size i. We calculate

$$ \sum\limits_{{3i = n - 1}} {\left( {\begin{array}{*{20}{c}} {{x_i}} \\3 \\\end{array} } \right) = \frac{1}{6}\left( {\sum\limits_{{i = j = k,i + j + k = n - 1}} {{x_i}{x_j}{x_k} + 3\sum\limits_{{i = j,i + 2j = n - 1}} {{x_i}{x_j} + 2\sum\limits_{{3i = n - 1}} {{x_i}} } } } \right)} $$
(12)

Note that if n − 1 is not divisible by three, then all of these sums are empty and, by definition, equal zero. Now, we add these three values, and sort them: First, we put all sums of products \( {x_i}{x_j}{x_k} \). One can easily see that this equals \( \frac{1}{6}\sum\limits_{{i + j + k = n - 1}} {{x_i}{x_j}{x_k}} \) as desired: For the first sum on the right-hand side of Eq. 10, there are three possibilities of choosing two indices from i, j, k to be equal, as we can permute the indices. Similarly, we can reproduce the second summand of Eq. 4. Finally, the third summand directly comes from Eq. 12, which completes the proof.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Grützmann, K., Böcker, S. & Schuster, S. Combinatorics of aliphatic amino acids. Naturwissenschaften 98, 79–86 (2011). https://doi.org/10.1007/s00114-010-0743-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00114-010-0743-2

Keywords

  • Aliphatic amino acids
  • Aliphatic side chain
  • Amino acid signalling
  • Enumeration of isomers
  • Pólya’s enumeration theorem
  • Ternary tree graphs